traceroute66 8 days ago

> When talking with reasonable people

When talking with reasonable people, they will tell you if they don't understand what you're saying.

When talking with reasonable people, they will tell you if they don't know the answer or if they are unsure about their answer.

LLMs do none of that.

They will very happily, and very confidently, spout complete bullshit at you.

It is essentially a lotto draw as to whether the answer is hallucinated, completely wrong, subtly wrong, not ideal, sort of right or correct.

An LLM is a bit like those spin the wheel game shows on TV really.

3
bbarn 8 days ago

They will also not be offended or harbor ill will when you completely reject their "pull request" and rephrase the requirements.

the_af 7 days ago

They will also keep going in circles when you rephrase the requirements, unless with every prompt you keep adding to it and mentioning everything they've already suggested that got rejected. While humans occasionally also do this (hey, short memories), LLMs are infuriatingly more prone to it.

A typical interaction with an LLM:

"Hey, how do I do X in Y?"

"That's a great question! A good way to do X in Y is Z!"

"No, Z doesn't work in Y. I get this error: 'Unsupported operation Z'."

"I apologize for making this mistake. You're right to point out Z doesn't work in Y. Let's use W instead!"

"Unfortunately, I cannot use W for company policy reasons. Any other option?"

"Understood: you cannot use W due to company policy. Why not try to do Z?"

"I just told you Z isn't available in Y."

"In that case, I suggest you do W."

"Like I told you, W is unacceptable due to company policy. Neither W nor Z work."

...

"Let's do this. First, use Z [...]"

abalashov 7 days ago

It's my experience that once you are in this territory, the LLM is not going to be helpful and you should abandon the effort to get what you want out of it. I can smell blood now when it's wrong; it'll just keep being wrong, cheerfully, confidently.

the_af 7 days ago

Yes, to be honest I've also learned to notice when it's stuck in an infinite loop.

It's just frustrating, but when I'm asking it something within my domain of expertise, of course I can notice, and either call it quits or start a new session with a radically different prompt.

lupire 7 days ago

Which LLMs and which versions?

daveguy 7 days ago

All. Of. Them. It's quite literally what they do because they are optimistic text generators. Not correct or accurate text generators.

e3bc54b2 7 days ago

This really grinds my gears. The technology is inherently faulty, but the relentless optimism of its future subtly hiding that by making it the user's mistake instead.

Oh you got a wrong answer? Did you try the new OpenAI v999? Did you prompt it correctly? Its definitely not the model, because it worked for me once last night..

traceroute66 7 days ago

> it worked for me once last night..

This !

Yeah, it probably "worked for me" because they spent a gazillion hours engaging in what the LLM fanbois call "prompt engineering", but you and I would call "engaging in endless iterative hacky work-arounds until you find a prompt that works".

Unless its something extremely simple, the chances of an LLM giving you a workable answer on the first attempt is microscopic.

Aeolun 7 days ago

Most optimistic text generators do not consider repeating the stuff that was already rejected a desireable path forward. It might be the only path forward they’re aware of though.

the_af 7 days ago

In some contexts I got ChatGPT to answer "I don't know" when I crafted a very specific prompt about not knowing being and acceptable and preferable answer to bullshitting. But it's hit and miss, and doesn't always work; it seems LLMs simply aren't trained to model admittance of ignorance, they almost always want to give a positive and confident answer.

seunosewa 7 days ago

You can use prompts to fix some of these problematic tendencies.

mike_ivanov 7 days ago

Yes you can, but it almost never works

johnb231 7 days ago

I think you are a couple of years out of date.

No longer an issue with the current SOTA reasoning models.

otabdeveloper4 7 days ago

Throwing more parameters at the problem does absolutely nothing to fix the hallucination and bullshit issue.

johnb231 7 days ago

Correct and it wasn’t fixed with more parameters. Reasoning models question their own output, and all of the current models can verify their sources online before replying. They are not perfect, but they are much better than they used to be, and it is practically not an issue most of the time. I have seen the reasoning models correct their own output while it is being generated. Gemini 2.5 Pro, GPT-o3, Grok 3.