"Expert in (now-)ancient arts draws strange conclusion using questionable logic" is the most generous description I can muster.
Quoting Chomsky:
> These considerations bring up a minor problem with the current LLM enthusiasm: its total absurdity, as in the hypothetical cases where we recognize it at once. But there are much more serious problems than absurdity.
> One is that the LLM systems are designed in such a way that they cannot tell us anything about language, learning, or other aspects of cognition, a matter of principle, irremediable... The reason is elementary: The systems work just as well with impossible languages that infants cannot acquire as with those they acquire quickly and virtually reflexively.
Response from o3:
LLMs do surface real linguistic structure:
• Hidden syntax: Attention heads in GPT-style models line up with dependency trees and phrase boundaries—even though no parser labels were ever provided. Researchers have used these heads to recover grammars for dozens of languages.
• Typology signals: In multilingual models, languages that share word-order or morphology cluster together in embedding space, letting linguists spot family relationships and outliers automatically.
• Limits shown by contrast tests: When you feed them “impossible” languages (e.g., mirror-order or random-agreement versions of English), perplexity explodes and structure heads disappear—evidence that the models do encode natural-language constraints.
• Psycholinguistic fit: The probability spikes LLMs assign to next-words predict human reading-time slow-downs (garden-paths, agreement attraction, etc.) almost as well as classic hand-built models.
These empirical hooks are already informing syntax, acquisition, and typology research—hardly “nothing to say about language.”
> LLMs do surface real linguistic structure...
It's completely irrelevant because the point he's making is that LLMs operate differently from human languages as evidenced by the fact that they can learn language structures that humans cannot learn. Put another way, I'm sure you can point out an infinitude of similarities between human language faculty and LLMs but it's the critical differences that make LLMs not useful models of human language ability.
> When you feed them “impossible” languages (e.g., mirror-order or random-agreement versions of English), perplexity explodes and structure heads disappear—evidence that the models do encode natural-language constraints.
This is confused. You can pre-train an LLM on English or an impossible language and they do equally well. On the other hand humans can't do that, ergo LLMs aren't useful models of human language because they lack this critical distinctive feature.
Is that true? This paper claims it is not.
Yes it's true, you can read my response to one of the authors @canjobear describing the problem with that paper in the comment linked below. But to summarize: in order to show what they want to show they have to take the simple, interesting languages based on linear order that Moro showed a human cannot learn and show that LLMs also can't learn them and they don't do that.
The reason the Moro languages are of interest are that they are computationally simple so it's a puzzle why humans can't learn them (and no surprise that LLMs can). The authors of the paper miss the point and show irrelevant things like there exist complicated languages that both humans and LLMs can't learn.
> You can pre-train an LLM on English or an impossible language and they do equally well
It's impressive that LLMs can learn languages that humans cannot. In what frame is this a negative?
Separately, "impossible language" is a pretty clear misnomer. If an LLM can learn it, it's possible.
The latter. Moro showed that you can construct simple language rules, in particular linear rules, like the third word of every sentence modifies the noun, that humans have a hard time learning (specifically they use different parts of their brain in MRI scans and take longer to process than control languages) and are different from conventional human language structure (which hierarchical structure dependent, i.e. roughly that words are interpreted according to their position in a parse tree not their linear order).
That's what "impossible language" means in this context, not something like computationally impossible or random.
Ok then .. what makes that a negative? You're describing a human limitation and a strength of LLMs
It's not a negative, it's just not what humans do, which is Chomsky's (a person studying what humans do) point.
As I said in another comment this whole dispute would be put to bed if people understood that they don't care about what humans do (and that Chomsky does).
Suggestion for you then, in your first response you would have been clearer to say "The reason Chomsky seems like such a retard here, is because he clings to irrelevant nonsense"
It's completely unremarkable that humans are unable to learn certain languages, and soon it will be unremarkable when humans have no cognitive edge over machines.
Response: Science? "Ancient Linguistics" would more accurately describe Chomsky's field of study and its utility
> Suggestion for you then, in your first response you would have been clearer to say "The reason Chomsky seems like such a retard here, is because he clings to irrelevant nonsense"
If science is irrelevant to you it's you who should have recognized this before spouting off.