Item 44089379

I recently saw a new LLM that was fooled by "20 pounds of bricks vs 20 feathers". These are not reasoning machines.

dghlsakjg • 5 days ago

I recently had a computer tell me that 0.1 + 0.2 != 0.3. It must not be a math capable machine.

Perhaps it is more important to know the limitations of tools rather than dismiss their utility entirely due to the existence of limitations.

1 reply

blamestross • 5 days ago

A computer isn't a math capable machine.

> Perhaps it is more important to know the limitations of tools rather than dismiss their utility entirely due to the existence of limitations.

Well, yes. And "reasoning" is only something LLMs do coincidentally, to their function as sequence continuation engines. Like performing accurate math on rationale numbers, it can happen if you put in a lot of work and accept a LOT of expensive computation. Even then there exists computations that just are not reasonable or feasible.

Reminding folks to dismiss the massive propaganda engine pushing this bubble isn't "dismissing their utility entirely".

These are not reasoning machines. Treating them like they are will get you hurt eventually.

1 reply

dghlsakjg • 5 days ago

My point is that computers, when used properly, can absolutely do math. And LLMs, when used properly, can absolutely explain the reasoning behind why a pound of bricks and a pound of feathers weigh the same.

Can they reason? Maybe, depending on your definition of reasoning.

An example: which weighs more a pound of bricks and 453.59 grams of feathers? Explain your reasoning.

LLM: The pound of bricks weighs slightly more.

*Reasoning:*

* *1 pound* is officially defined as *0.45359237 kilograms*, which is *453.59237 grams*. * You have *453.59 grams* of feathers.

So, the pound of bricks (453.59237 grams) weighs a tiny fraction more than the 453.59 grams of feathers. For most practical purposes, they'd be considered the same, but technically, the bricks are heavier by 0.00237 grams. /llm

It is both correct and the reasoning is sound. Do I understand that the machine is a pattern following machine, yes! Is there an argument to be made that humans are also that? Probably. Chomsky himself argued in favor of a universal grammar, after all.

I’m steel manning this a bit, but the point is that LLMs are capable of doing some things which are indistinguishable from human reasoning in terms of results. Does the process matter in all cases?

1 reply

blamestross • 4 days ago

> Does the process matter in all cases?

So there are 2 dimensions being conflated here:

"Does how the reasoning work matter in all cases" Pretty Obviously no, but it may matter in some of them. We also don't really understand which ones yet.

"Does the reasoning work as intended in all cases?" Pretty Obviously no, but it doesn't work for at least some of them. We also don't really understand which ones yet.

"We also don't really understand which ones yet" Is the critical point of caution.

StrandedKitty • 5 days ago

Surely it just reasoned that you made a typo and "autocorrected" your riddle. Isn't this what a human would do? Though to be fair, a human would ask you again to make sure they heard you correctly. But it would be kind of annoying if you had to verify every typo when using an LLM.

HDThoreaun • 5 days ago

Tons of people fall for this too. Are they not reasoning? LLMs can also be bad reasoning machines.

1 reply

petermcneeley • 5 days ago

I dont have much use for a bad reasoning machine.

2 replies

the8472 • 5 days ago

I could retort with another gotcha argument, but instead of doing that perhaps we can do better than that?

An attempt: They are bad reasoning machines that already are useful in a few domains and they're improving faster than evolutionary speeds. So even if they're not useful today in a domain relevant to you there's a significant possibility they might be in a few months. AlphaEvolve would have been scifi a decade ago.

"It's like if a squirrel started playing chess and instead of "holy shit this squirrel can play chess!" most people responded with "But his elo rating sucks""

HDThoreaun • 5 days ago

I can think of tons of uses for a bad reasoning machine as long as it’s cheap enough.

1 reply

guappa • 4 days ago

Which those things aren't. In fact they cost considerably more than hiring someone.

1 reply

HDThoreaun • 4 days ago

LLMs cost significantly less than even a high schooler

1 reply

guappa • 3 days ago

Just because for now they are burning money and it's priced considerably under what it's costing them.

Which is why I spoke of "cost" not of "price".

They're in the "disrupt" phase. But that's not forever.

1 reply

HDThoreaun • 2 days ago

No. The marginal cost of an LLM is much, much lower than a high schooler. It is not even close. There is a lot of investment happening but revenue will continue to increase as the product improves and more use it or the money will stop flowing. If training stopped LLMs would be immensely profitable right now

downboots • 5 days ago

But are you aware of the weight comparison of a gallon of water vs a gallon of butane ?

1 reply

petermcneeley • 5 days ago

No im not. A gallon is a measure of volume? This is a USA unit.

1 reply

downboots • 4 days ago

https://www.reddit.com/r/dadjokes/comments/flr7tc/which_weig...

fzzzy • 5 days ago

20 feathers?

1 reply

lostmsu • 5 days ago

Yes, Claude 4 Sonnet just said they both weight 20 pounds. UPD. and so did Gemini 2.5 Flash. And MS Copilot in "Think deeper" mode.