Item 44089533

newAccount2025 • 5 days ago

[flagged]

Smaug123 • 5 days ago

From some Googling and use of Claude (and from summaries of the suggestively titled "Impossible Languages" by Moro linked from https://en.wikipedia.org/wiki/Universal_grammar ), it looks like he's referring to languages which violate the laws which constrain the languages humans are innately capable of learning. But it's very unclear why "machine M is capable of learning more complex languages than humans" implies anything about the linguistic competence or the intelligence of machine M.

2 replies

cmiles74 • 5 days ago

Firstly, can't speak for Chomsky.

In this article he is very focused on science and works hard to delineate science (research? deriving new facts?) from engineering (clearly product oriented). In his opinion ChatGPT falls on the engineering side of this line: it's a product of engineering, OpenAI is concentrating on marketing. For sure there was much science involved but the thing we have access to is a product.

IMHO Chomsky is asking: while ChatGPT is a fascinating product, what is it teaching us about language? How is it advancing our knowledge of language? I think Chomsky is saying "not much."

Someone else mentioned embeddings and the relationship between words that they reveal. Indeed, this could be a worthy area of further research. You'd think it would be a real boon when comparing languages. Unfortunately the interviewer didn't ask Chomsky about this.

foobarqux • 5 days ago

It doesn't, it just says that LLMs are not useful models of the human language faculty.

1 reply

specialist • 5 days ago

This is where I'm stuck.

For other commentators, as I understand it, Chomsky's talking about well-defined grammar and language and production systems. Think Hofstadter's Godel Escher Bach. Not "folk" understanding of language.

I have no understanding or intuition, or even a finger nail grasp, for how an LLM generates, seemingly emulating, "sentences", as though created with a generative grammar.

Is any one comparing and contrasting these two different techniques? Being noob, I wouldn't even know where to start looking.

I've gleaned that someone(s) are using LLM/GPT to emit abstract syntax trees (vs a mere stream of tokens), to serve as input for formal grammars (eg programming source code). That sounds awesome. And something I might some day sorta understand.

I've also gleaned that, given sufficient computing power, training data for future LLMs will have tokenized words (vs just character sequences). Which would bring the two strategies closer...? I have no idea.

(Am noob, so forgive my poor use of terminology. And poor understanding of the tech, too.)

1 reply

foobarqux • 5 days ago

I don't really understand your question but if a deep neural network predicts the weather we don't have any problem accepting that the deep neural network is not an explanatory model of the weather (the weather is not a neural net). The same is true of predicting language tokens.

2 replies

specialist • 4 days ago

Apologies, I don't know enough to articulate my question, which is probably nonsensical any way.

LLMs (like GPT) and grammars (like Backus–Naur Form) are two different kinds of generative (production) systems, right?

You've been (heroically) explaining Chomsky's criticism of LLMs to other noobs: grammars (theoretically) explain how humans do language, which is very different from how ChatGPT (stochastic parrots) do language. Right?

Since GPT mimics human language so convincingly, I've been wondering if there's any overlap of these two generative systems.

Especially once the (tokenized) training data for GPTs is word based instead of just snippets of characters.

Because I notice grammars everywhere and GPT is still magic to me. Maybe I'd benefit if I could understand GPTs in terms of grammars.

1 reply

foobarqux • 4 days ago

> Since GPT mimics human language so convincingly, I've been wondering if there's any overlap of these two generative systems.

It's not really relevant if there is overlap, I'm sure you can list a bunch of ways they are similar. What's important is 1. if they are different in fundamental ways and 2. whether LLMs explain anything about the human language faculty.

For 1. the most important difference is that human languages appear to have certain constraints (roughly that language has parse tree/hierarchical structure) and (from the experiments of Moro) humans seem to not be able to learn arguably simpler structures that are not hierarchical. LLMs on the other hand can be trained on those simpler structures. That shows that the acquisition process is not the same, which is not surprising since neural networks work on arbitrary statistical data and don't have strong inductive biases.

For 2. even if it turned out that LLMs couldn't learn the same languages it doesn't explain anything. For example you could hard-code the training to fail if it detects an "impossible language" then what? You've managed to create an accurate predictor but you don't have any understanding of how or why it works. This is easier to understand with non-cognitive systems like the weather or gravity: If you create a deep neural network that accurately predicts gravity it is not the same as coming up with the general theory of relativity (which could in fact be a worse predictor for example at quantum scales). Everyone argues the ridiculous point that since LLMs are good predictors then gaining understanding about the human language faculty is useless, which is a stance that wouldn't be accepted for the study of gravity or in any other field.

fc417fc802 • 5 days ago

> is not an explanatory model of the weather (the weather is not a neural net)

I don't follow. Aren't those entirely separate things? The most accurate models of anything necessarily account for the underlying mechanisms. Perhaps I don't understand what you mean by "explanatory"?

Specifically in the case of deep neural networks, we would generally suppose that it had learned to model the underlying reality. In effect it is learning the rules of a sufficiently accurate simulation.

1 reply

foobarqux • 5 days ago

> The most accurate models of anything necessarily account for the underlying mechanisms

But they don't necessarily convey understanding to humans. Prediction is not explanation.

There is a difference between Einstein's General Theory of Relativity and a deep neural network that predicts gravity. The latter is virtually useless for understanding gravity (that's even if makes better predictions).

> Specifically in the case of deep neural networks, we would generally suppose that it had learned to model the underlying reality. In effect it is learning the rules of a sufficiently accurate simulation.

No, they just fit surface statistics, not underlying reality. Many physics phenomena were predicted using theories before they were observed, they would not be in the training data even though they were part of the underlying reality.

1 reply

fc417fc802 • 5 days ago

> No, they just fit surface statistics, not underlying reality.

I would dispute this claim. I would argue that as models become more accurate they necessarily more closely resemble the underlying phenomena which they seek to model. In other words, I would claim that as a model more closely matches those "surface statistics" it necessarily more closely resembles the underlying mechanisms that gave rise to them. I will admit that's just my intuition though - I don't have any means of rigorously proving such a claim.

I have yet to see an example where a more accurate model was conceptually simpler than the simplest known model at some lower level of accuracy. From an information theoretic angle I think it's similar to compression (something that ML also happens to be almost unbelievably good at). Related to this, I've seen it argued somewhere (I don't immediately recall where though) that learning (in both the ML and human sense) amounts to constructing a world model via compression and that rings true to me.

> Many physics phenomena were predicted using theories before they were observed

Sure, but what leads to those theories? They are invariably the result of attempting to more accurately model the things which we can observe. During the process of refining our existing models we predict new things that we've never seen and those predictions are then used to test the validity of the newly proposed models.

1 reply

foobarqux • 5 days ago

This is getting away from the original point which is that deep neural networks are, by default, not explanatory in the way Einstein's theory of relativity is.

But even so,

> In other words, I would claim that as a model more closely matches those "surface statistics" it necessarily more closely resembles the underlying mechanisms that gave rise to them.

I don't what it means, for example, for a deep neural network, to "more resemble" the underlying process of the weather. It's also obviously false in general: If you have a mechanical clock and quartz-crystal analog clock you are not going to be able to derive the internal workings of either or distinguish between them from the hand positions. The same is true for two different pseudo-random number generator circuits that produce the same output.

> I have yet to see an example where a more accurate model was conceptually simpler than the simplest known model at some lower level of accuracy.

I don't understand what you mean. Simple models often yield a high level of understanding without being better predictors. For example an idealized ball rolling down a plane, Galileo's mass/gravity thought experiment, Kepler etc. Many of these models ignore less important details to focus on the fundamental ones.

> From an information theoretic angle I think it's similar to compression (something that ML also happens to be almost unbelievably good at). Related to this, I've seen it argued somewhere (I don't immediately recall where though) that learning (in both the ML and human sense) amounts to constructing a world model via compression and that rings true to me.

In practice you get nowhere trying to recreate the internals of a cryptographic pseudo-random number generator from the output it produces (maybe in theory you could do it with infinite data and no bounds on computational complexity or something) even though the generator itself could be highly compressed.

> Sure, but what leads to those theories? They are invariably the result of attempting to more accurately model the things which we can observe.

Yes but if the model does not lead to understanding you cannot come up with the new ideas.

1 reply

fc417fc802 • 4 days ago

Admittedly my original question (how "not explanatory" leads to "is not a") begins to look like a nit now that I understand the point you were trying to make (or at least I think I do). Nonetheless the discussion seems interesting.

That said, I'm inclined to object to this "explanatory" characteristic you're putting forward. We as humans certainly put a lot of work into optimizing the formulation of our models with the express goal of easing human understanding but I'm not sure that's anything more than an artifact of the system that produces them. At the end of the day they are tools for accomplishing some purpose.

Perhaps the idea you are attempting to express is analogous to concepts such as principal component analysis as applied to the representation of the final model?

> If you have a mechanical clock and quartz-crystal analog clock you are not going to be able to derive the internal workings of either or distinguish between them from the hand positions.

Arguably modern physics analogously does exactly that, although the amount of resources required to do so is astronomical.

Anyhow my claim was not about the ability or lack thereof to derive information from the outputs of a system. It was that as you demand increased accuracy from a model of the hand positions (your example) you will be necessarily forced to model the internal workings of the original physical system to increasingly higher fidelity. I claim that there is no way around this - that fundamentally your only option for increasing the accuracy of the output of a model is for it to more closely resemble the inner workings of the thing being modeled. Taken to the (notably impossible) extreme this might take the form of a quantum mechanics based simulation of the entire system.

Extrapolating this to the weather, I'm claiming that any reasonably accurate ML model will necessarily encompass some sort of underlying truth about the physical system that it is modeling and that as it becomes more accurate it will encode more such truth. Notably, I make no claim about the ability of an unaided human to interpret such truths from a binary blob of weights.

> I don't understand what you mean. Simple models often yield a high level of understanding without being better predictors.

I said nothing about efficiency of educating humans (ie information gathering by or transfer between agents) but rather about model accuracy versus model complexity. I am claiming that more accurate models will invariably be more complex, and that said complexity will invariably encode more information about the original system being modeled. I have yet to encounter a counterexample.

> [CSPRNG recreation]

It is by design impossible to "model" the output of such a function in a bitwise accurate manner without reproducing the internals with perfect fidelity. In the event that someone figures out how to model the output in an imprecise manner without access to the key that would generally be construed as the algorithm having been broken. In other words that example aligns perfectly with my point in the sense that it cannot be approximated to any degree better than random chance with a "simpler" (ie less computationally complex than the original) mechanism. It takes the continuum of accuracy that I was originally describing and replaces it with a step function.

> Yes but if the model does not lead to understanding you cannot come up with the new ideas.

I suppose human understanding is a prerequisite to new human constructed models but my (counter-)point remains. Physics theories are "nothing more" than humans fitting "surface statistics" to increasing degrees of accuracy. I think this is a fairly fundamental truth with regards to the philosophy of science.