>Can LLMs actually parse human languages?
IMHO, no, they have nothing approaching understanding. It's Chinese Rooms[1] all the way down, just with lots of bell and whistles. Spicy autocomplete.
Actually, the LLMs made me realize John Searle’s “Chinese room” doesnt make much sense
Because languages have many similar concepts so the operator inside the Chinese room can understand nearly all the concepts without speaking Chinese.
And the LLM can translate to and from any language trivially, the inner layers do the actual understanding of concepts.
Go ask the operator of a Chinese room to do some math they weren't taught in school, and see if the translation guide helps.
The analogy I've used before is a bright first-grader named Johnny. Johnny stumbles across a high school algebra book. Unless Johnny's last name is von Neumann, he isn't going to get anything out of that book. An LLM will.
So much for the Chinese Room.
> Go ask the operator of a Chinese room to do some math they weren't taught in school, and see if the translation guide helps.
That analogy only holds if LLMs can solve novel problems that can be proven to not exist in any form in their training material.
They do. Spend some time using a modern reasoning model. There is a class of interesting problems, nestled between trivial ones whose answers can simply be regurgitated and difficult ones that either yield nonsense or involve tool use, that transformer networks can absolutely, incontrovertibly reason about.
Have any LLMs solved any of the big (or even lesser known) unanswered problems in math, physics, computer science?
It may appear that they are solving novel problems but given the size of their training set they have probably seen them. There are very few questions a person can come up with that haven't already been asked and answered somewhere.
Google's AlphaEvolve recently produced a novel matrix multiplication function slightly faster than the previous state of the art that couldn't have been in any training data. While not a hard unsolved problem, I think it's good evidence that an LLM is capable of synthesizing new solutions to problems.
Reason about: sure. Independently solve novel ones without extreme amounts of guidance: I have yet to see it.
Granted, for most language and programming tasks, you don’t need the latter, only the former.
99.9% of humans will never solve a novel problem. It's a bad benchmark to use here
But they will solve a problem novel to them, since they haven't read all of the text that exists.
I agree. But it’s worth being somewhat skeptical of ASI scenarios if you can’t, for example, give a well formulated math problem to a LLM and it cannot solve it. Until we get a Reimann hypothesis calculator (or equivalent for hard/old unsolved maths) it’s kind of silly to be debating the extreme ends of AI cognition theory
"I'm taking this talking dog right back to the pound. It completely whiffed on both Riemann and Goldbach. And you should see the buffer overflows in the C++ code it wrote for me."
I have been able to get chatgpt to synthesize in the edges of two domains in ideaspace, say, psychology and economics, but surprisingly it struggled helping me write ODE code in go. In the first case, I think it actually synthesized. In the latter it couldn't extrapolate enough ideas from the two fields into one.
How can you distinguish "I think it did something really impressive in the first case but not the second" from "it spat out something that looked interesting in both cases but in the latter case there was an objective criteria that exposed a lack of true understanding"?
It's famously easier to impress people with soft-sciences speculation than it is to impress the rules of math or compilers.
I think people give training data too much credit. Obviously it's important, but it also isn't a database of knowledge like it's made out to be.
You can see this in riddles that are obviously in the training set, but older or lighter models still get them wrong. Or situations where the model gets them right, but uses a different method than the ones used in the training set.
A "Chinese Room" absolutely will, because the original thought experiment proposed no performance limits on the setup - the Room is said to pass the Turing Test flawlessly.
People keep using "Chinese Room" to mean something it isn't and it's getting annoying. It is nothing more than a (flawed) intuition pump and should not be used as an analogy for anything, let alone LLMs. "It's a Chinese Room" is nonsensical unless there is literally an ACTUAL HUMAN in the setup somewhere - its argument, invalid as it is, is meaningless in its absence.
A Chinese Room has no attention model. The operator can look up symbolic and syntactical equivalences in both directions, English to Chinese and Chinese back to English, but they can't associate Chinese words with each other or arrive at broader inferences from doing so. An LLM can.
If I were to ask a Chinese room operator, "What would happen if gravity suddenly became half as strong while I'm drinking tea?," what would you expect as an answer?
Another question: if I were to ask "What would be an example of something a Chinese room's operator could not handle, that an actual Chinese human could?", what would you expect in response?
Claude gave me the first question in response to the second. That alone takes Chinese Rooms out of the realm of any discussion regarding LLMs, and vice versa. The thought experiment didn't prove anything when Searle came up with it, and it hasn't exactly aged well. Neither Searle nor Chomsky had any earthly idea that language was this powerful.
Where are you getting all this (wrong) detail about the internals of the Chinese Room? The thought experiment merely says that the operator consults "books" and follows "instructions" (no doubt Turing-complete but otherwise unspecified) for manipulating symbols they they explicitly DO NOT understand - they do NOT have access to "symbolic and syntactical equivalences" - that is the POINT of the thought experiment. But the instructions in the books in a Chinese Room could perfectly well have an attention model. The details are irrelevant, because - I stress again - Searle's Chinese Room is not cognitively limited, by definition. Its hypothetical output is indistinguishable from a Chinese human.
I tend to agree that Chinese Rooms should be kept out of LLM discussions. In addition to it being a flawed thought experiment, of all the dozens of times I've seen them brought up, not a single example has demonstrated understanding of what a Chinese Room is anyway.
The details are irrelevant, because - I stress again - Searle's Chinese Room is not cognitively limited, by definition.
So said Searle. But without specifying what he meant, it was a circular statement at best. Punting to "it passes a Turing Test" just turns it into a different debate about a different flawed test.
The operator has no idea what he's doing. He doesn't know Chinese. He has a Borges-scale library of Chinese books and a symbol-to-symbol translation guide. He can do nothing but manipulate symbols he doesn't understand. How anyone can pass a well-administered Turing test without state retention and context-based reflection, I don't know, but we've already put more thought into this than Searle did.
Give Johnny a copier and a pair of scissors and he will be able to perform more or less the same; and likely get more out of it as well, since he has a clue what he is doing.
How can you make that claim? Have you ever used an LLM that hasn't encountered high school algebra in it's training data? I don't think so.
I have at least encountered many LLMs with many school's worth of algebra knowledge, but fail miserably at algebra problems.
Similarly, they've ingested human-centuries or more of spelling bee related text, but can't reliably count the number of Rs in strawberry. (yes, I understand tokenization is to blame for a large part of this. perhaps that kind of limitation applies to other things too?)
Similarly, they've ingested human-centuries or more of spelling bee related text, but can't reliably count the number of Rs in strawberry
Sigh
That sigh might be a chronic condition, if it's happening even when people demonstrate a decent understanding of the causes. You may want to get that looked at.
An LLM will get ... what exactly ? The ability to reorder its sentences ? The LLM doesn't think, doesn't understand, doesn't know what matters more than not, doesn't use what it learns, doesn't expand what it learns to new knowledge, doesn't enjoy reading that book and doesn't suffer through it.
So what is it really gonna do with a book, that LLM ? Reorder its internal matrix to be a little bit more precise when autocompleting sentences sounding like the book ? We could build an nvidia cluster the size of the Sun and it would repeat sentences back to us in unbelievable ways but would still be unable to take a knowledge-based decision, I fear.
So what are we in awe at exactly ? A pretty parrot.
The day the Chinese room metaphor disappears is when ChatGPT replies to you that your question is so boring it doesn't want to expend the resources to think about it. But it'd be ready to talk about this or that, that it's currently trying to get better at. When it finally has agency over its own intelligence. When it acquires a purpose.
This isn't really the meaning of the Chinese room. The Chinese room presupposes that the output is identical to that of a speaker who understands the language. It is not arguing that there is any sort of limit to what an AI can do with its output and it is compatible with the AI refusing to answer or wanting to talk about something else.
LLM models are to a large extent neuronal analogs of human neural architecture
- of course they reason
The claim of the “stochastic parrot” needs to go away
Eg see: https://www.anthropic.com/news/golden-gate-claude
I think the rub is that people think you need consciousness to do reasoning, I’m NOT claiming LLMs have consciousness or awareness
They are really not neuronal analogs, reasoning is far from what they do. If they reasoned, they'd stick to their guns more readily, but try to contradict an LLM and it will make any logic leap you ask it too.
If you debate with me, I'll keep reasoning on the same premises and usually the difference between two humans is not in reasoning but in choice of premises.
For instance you really want here to assert that LLM are close to human, I want to assert they're not - truth is probably in between but we chose two camps. We'll then reason from these premises, reach antagonistic conclusions and slowly try to attack each other point.
An LLM cannot do that, it cannot attack your point very well, it doesn't know how to say you're wrong, because it doesn't care anyway. It just completes your sentences, so if you say "now you're wrong, change your mind" it will, which sounds far from reasoning to me, and quite unreasonable in fact.
Gemini 2.5 will tell you when you're wrong. It's the first model to do so.
> An LLM cannot do that, it cannot attack your point very well, it doesn't know how to say you're wrong, because it doesn't care anyway. It just completes your sentences, so if you say "now you're wrong, change your mind" it will, which sounds far from reasoning to me, and quite unreasonable in fact.
That is absolute bullshit. Go try any frontier reasoning model such as Gemini 2.5 Pro or GPT-o3 and see how that goes. They will inform you that you are full of shit.
Do you understand that they are deep learning models with hundreds of layers and trillions of parameters? They have learned patterns of reasoning, and can emulate human reasoning well enough to call you out on that nonsense.
> LLM models are to a large extent neuronal analogs of human neural architecture
They are absolutely not. Despite the disingenuous name, computer neural nets are nothing like biological brains.
(Neural nets are a generalization of the logistic regression.)