gitaarik 7 days ago

Yes ok, it can generate new stuff, but it's dependent on human curated reward models to score the output to make it usable. So it still depends on human thinking, it's own "thinking" is not sufficient. And there won't be a point when human curated reward models are not needed anymore.

LLM's will make a lot of things easier for humans, because most of the thinking the humans do have been automated into the LLM. But ultimately you run into a limit where the human has to take over.

3
JoshCole 7 days ago

> dependent on human curated reward models to score the output to make it usable.

This is a false premise, because there already exist systems, currently deployed, which are not dependent on human-curated reward models.

Refutations of your point include existing systems which generate a reward model based on some learned AI scoring function, allowing self-bootstrapping toward higher and higher levels.

A different refutation of your point is the existing simulation contexts, for example, by R1, in which coding compilation is used as a reward signal; here the reward model comes from a simulator, not a human.

> So it still depends on human thinking

Since your premise was false your corollary does not follow from it.

> And there won't be a point when human curated reward models are not needed anymore.

This is just a repetition of your previously false statement, not a new one. You're probably becoming increasingly overconfident by restating falsehoods in different words, potentially giving the impression you've made a more substantive argument than you really have.

gitaarik 7 days ago

So to clarify, it could potentially come up with (something close to) C, but if you want it to get to D, E, F etc, it will become less and less accurate for each consequentive step, because it lacks the human curated reward models up to that point. Only if you create new reward models for C, the output for D will improve, and so on.

JoshCole 7 days ago

> Only if you create new reward models for C, the output for D will improve, and so on.

Again, tons of false claims. One is that 'you' have to create the reward model. Another that it has to be human-curated at all. Yet another is that you even need to do that at all: you can instead have the model build a bigger model of itself, train using its existing resources or more of them, then synthesize itself back down. Another way you can get around it is to augment the existing dataset in some way. No other changes except resource usage and yet the resulting model will be better, because more resources went into its construction.

Seriously notice: you keep making false claims again and again and again and again and again. You're not stating true things. You really need to reflect. If almost every sentence you speak on this topic is false, why is it that you think you should be able to persuade me to your views? Why should I believe your views, when you say so many things that are factually inaccurate, rather than my own views?

gitaarik 6 days ago

Ok, so you claim that LLMs can get smarter without human validation. So why do they hallucinate at all? And why are all reward models currently curated by humans? Or are you claiming they aren't?

JoshCole 6 days ago

I don't find it reasonable that you didn't understand my corrections, because current AI already do. So I'm exiting the conversation.

https://chatgpt.com/share/683a3c88-62a8-8008-92ef-df16ce2e8a...

gitaarik 6 days ago

Ok, this is interesting indeed and I'll investigate more into it. But I think my points still stand. Let me elaborate.

An LLM only learns through input text. It doesn't have a first-person 3D experience of the world. So it can't execute physical experiments, or even understand them. It can understand the texts about it, but it can't visualize it, because it doesn't have a visual experience.

And ultimately our physical world is governed by physical processes. So at the fundamentals of physical reality, the LLMs lack understanding. And therefore will stay dependent on humans educating and correcting it.

You might get pretty impressively far with all kinds of techniques, but you can't cross this barrier with just LLMs. If you want to, you have to give it senses like humans to give it an experience of the world, and make it understand these experiences. And sure they're already working on that, but that is a lot harder to create than a comprehensive machine learning algorithm.

JoshCole 6 days ago

You're doing this thing again where you say tons of things that aren't true.

> An LLM only learns through input text.

This is false. There already exist LLM which understand more than just text. Relevant search term: multi-modality.

> It doesn't have a first-person 3D experience of the world.

Again false. It is trivial to create such an experience with multi-modality. Just set up an input device which streams that.

> So it can't execute physical experiments, or even understand them.

Here you get confused again. It doesn't follow, based on perceptual modality, that someone can't do or understand experiments. Hellen Keller can be both blind, but also do an experiment.

Beyond just being confused, you also make another false claim. Current LLMs already have the capacity to run experiments and do so. Search terms: tool usage, ReAct loop, AI agents.

> It can understand the texts about it, but it can't visualize it, because it doesn't have a visual experience.

Again, false!

Multi-modal LLMs currently possess the ability to generate images.

> And ultimately our physical world is governed by physical processes. So at the fundamentals of physical reality, the LLMs lack understanding. And therefore will stay dependent on humans educating and correcting it.

Again false. The same sort of reasoning would claim that Hellen Keller couldn't read a book, but braille exists. The ability to acquire information outside an umwelt is a capability that intelligence enables.

gitaarik 5 days ago

You come up with very interesting points, and I'm thankful for that. But I also think you're missing the crux or my message. LLMs don't experience the world the same way humans do. And they also don't think in the same way. So you can train them very far with enough input data, but there will always be a limit of what they can understand compared to a human. If you want them to think and experience the world in the same way, you basically have to create a complete human.

My example about the visualization was just an example to prove a point. What I ultimately mean is the whole complete human experience. And besides, if you give it eyes, what data are you gonna train it on? Most videos on the internet are filmed with one lens, which doesn't give you a 3D visual. So you would have to train it like a baby growing up, trial on error. And then again we're talking only about the visual.

Hellen Keller wasn't born blind, so she did have a chance to develop her visual brain functions. Most people can visualize things with their eyes closed.

JoshCole 5 days ago

Chess engines cannot see like a human can. When they think they don't necessarily think using the exact same method that a human uses. Yet train a chess engine for a very long time and it can actually end up understanding chess better than a human can.

I do understand the points you are attempting to make. The reason you're failing to prove your point is not because I am failing to understand the thrust of what you were trying to argue.

Imagine you were talking to someone who was a rocket scientist, and you were talking to them about engines and you had an understanding of engines that was predicated on your experience with cars. You start making claims about the nature of engines and they disagree with you they argue with you and they point out all these ways that you're wrong. Is this person going to be doing this because they're not able to understand your points? Or is it more likely that their experience with engines that are different than the engines that you're used to give them a different perspective that forced them to think of the world in a different way than you do?

gitaarik 5 days ago

Well chess has a very limited set of rules and playing field. And the way to win in chess is to be able to think forward, how all the moves could play out, and pick the best one. This is relatively easy to create an algorithm for that surpasses humans. That is what computers are good at: executing specific algorithms very fast. A computer will always beat a human to that.

So such algorithms can replace certain functions of humans, but they can't replace the human as a whole. And that is the same with LLMs. They save us time for repetative tasks, but they can't replace all of our functions. In the end an LLM is a comprehensive algorithm constantly updated with machine learning. It's very helpful, but it has its limits. The limit is constantly surpassed, but it will never replace a full human. To do that you need to do a whole lot more than a comprehensive machine learning algorithm. They can get very close to something that looks like a human, but there will always be something lacking. Which then again can be improved upon, but you never reach the same level.

That is why I don't worry about AI taking our jobs. They replace certain functions, which will make our job easier. I don't see myself as a coder, I see myself as a system designer. I don't mind if AIs take over (certain parts of) the coding process (once they're good enough). It will just make software development easier and faster. I don't think there will be less demand for software developers.

It will change our jobs, and we'll have to adapt to that. But that is always what happens with new technology. You have to grow along with the changes and not expect that you can keep doing the same thing for the same value. But I think that for most software developers that isn't news. In the old days people were programming in assembly, then compiled languages came and then higher level languages. Now we have LLMs, which (when they become good enough) will just be another layer of abstraction.

vidarh 7 days ago

> And there won't be a point when human curated reward models are not needed anymore.

This doesn't follow at all. There's no reason why a model can not be made to produce reward models.

gitaarik 7 days ago

But reward models are always curated by humans. If you generate a reward model with an LLM, it will contain hallucinations that need to be corrected by humans. But that is what a reward model is for. To correct the hallucinations of LLMs.

So yeah theoretically you could generate reward models with LLMs, but they won't be any good, unless they are curated by other reward models that are ultimately curated by humans.

vidarh 7 days ago

> But reward models are always curated by humans.

There is no inherent reason why they need to be.

> So yeah theoretically you could generate reward models with LLMs, but they won't be any good, unless they are curated by other reward models that are ultimately curated by humans.

This reasoning is begging the question: The reasoning is true only if the conclusion is true. It's therefore a logically invalid argument.

There is no inherent reason why this needs to be the case.

gitaarik 7 days ago

Sorry but I don't follow your logic. Are you claiming that reward models that aren't curated by humans perform as well as ones that are?

Then what is a reward model's function according to you?

vidarh 6 days ago

I'm claiming exactly what I wrote: That there is no inherent reason why a human curated one needs to be better.

JoshCole 7 days ago

In reinforcement learning and related fields, a _reward model_ is a function that assigns a scalar value (a reward) to a given state, representing how desirable it is. You're at liberty to have compound states: for an example, a trajectory (often called tau) or a state action pair (typically represented by s and a).