Item 44106293

vel0city • 7 days ago

They told the prompt to act boldly and take initiative using any tools available to it. It's not like it's just doing that out of nowhere. It's pretty easy to see where that behavior was coming from.

Read deeper than the headlines.

lolinder • 7 days ago

I did read that, but you don't know that that's the only way to trigger that kind of behavior. The point is that you're giving a probability drive that you don't have direct control over access to your system. It can be fine over and over until suddenly it's not, so it needs to be treated like you'd treat untrusted code.

Unfortunately, in the current developer world treating an LLM them like untrusted code means giving it full access to your system, so I guess that's fine?

1 reply

vel0city • 7 days ago

Sure, but on the same hand we can't exactly be surprised when we tell an agent "in cases of x do y" and be surprised it did y when x happened.

Ignoring that the prompt all but directly told the agent to carry out that action in your description of what happened seems disingenuous to me. If we gave the llm a fly_swatter tool, told it bugs are terrible and spread disease and we should try do to things to reduce the spread of disease, and said "hey look its a bug!" should we also be surprised it used the fly_swatter?

Your comment reads like Claude just inherently did that act seemingly out of nowhere, but the researchers prompted it to do it. That is massively important context to understanding the story.