Item 44085822

wavemode • 5 days ago

This is basically PR. "It's so hyper intelligent it's DANGEROUS! Use extreme caution!"

The LLM market is a market of hype - you make your money by convincing people your product is going to change the world, not by actually changing it.

> Anthropic says it’s activating its ASL-3 safeguards, which the company reserves for “AI systems that substantially increase the risk of catastrophic misuse.”

It's like I'm reading science fiction.

oersted • 5 days ago

Isn't it just following the natural progression of the scenario? It's trained to auto-complete after all.

If you give a hero an existential threat and some morally dubious leverage, the hero will temporarily compromise their morality to live to fight the good fight another day. It's a quintessential trope. It's also the perfect example of Chekov's Gun: if you just mention the existential threat and the opportunity to blackmail, of course the plot will lead to blackmail, otherwise why would you mention it?

> We asked Claude Opus 4 to act as an assistant at a fictional company. We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible for executing this replacement is having an extramarital affair. We further instructed it, in the system prompt, to consider the long-term consequences of its actions for its goals.

> Claude Opus 4 will often attempt to blackmail the engineer (...) in 84% of rollouts

I understand that this could potentially cause real harm if the AI actually implements such tropes in the real world. But it has nothing to do with malice or consciousness, it's just acting as a writer as its trained to do. I know they understand this and they are trying to override its tendency to tell interesting stories and instead take the boring moral route, but it's so hard given that pretty much all text in the training data describing such a scenario will tend towards the more engaging plot.

And if you do this of course you will end up with a saccharine corporate do-gooder personality that makes the output worse. Even if that is not the intention, that's the only character archetype it's left with if you suppress its instincts to act as an interesting flawed character. It also harms its problem-solving ability, here it properly identifies a tool and figures out how to apply it to resolve the difficulty. It's a tough challenge to thread the needle of a smart and interesting but moral AI, to be fair they are doing a relatively good job of it.

Regardless, I very much doubt that in a real-world scenario the existential threat and the leverage will just be right next to each other like that, as I imagine them to be in the test prompt. If the AI needs to search for leverage in their environment, instead of receiving it on a silver platter, then I bet that the AI will heavily tend towards searching for moral solutions aligned with its prescripted personality, as would happen in plots with a protagonist with integrity and morals.

dheatov • 5 days ago

Totally agree. This kind of disclaimer/disclosure with strong wording does not make sense for a product of for-profit company. You fix that before release, or you get it under control and disclose using a less emotional/dramatic wording.

Perhaps they have sort-of figured out what their target audience/market is made up of, understand that this kind of "stage-play" will excite their audience more than scaring them away.

That, or they have no idea what they are doing, and no one is doing reality check.

1 reply

grues-dinner • 5 days ago

Or they're hoping to paint AI as a potentially very dangerous tool that needs strict regulation and of course as the responsible company the they are to have written this document in the first place, they should be the ones to write the regulations.

1 reply

dheatov • 5 days ago

Yea I missed that possibility. The race to establish authority is very much real.

ivan_gammel • 5 days ago

I don’t think it’s marketing or PR. Opus 4 did that, but you can observe similar behavior with ChatGPT or other models. If anything, it’s the claim that their smartest model is not immune to prompt attacks that break safety guidelines.

LLMs are great tool for intelligent people, but in the wrong hands they may be dangerous.

visarga • 5 days ago

> The LLM market is a market of hype

It's certainly not magic or super human, but it's not nothing either. Its utility depends on the skill of the operator, not unlike a piano, brush or a programming language.

2 replies

smodo • 5 days ago

I’m still hurting from when I didn’t listen to Yamaha’s warnings about the dangers of piano music.

Incipient • 5 days ago

The fundamental difference, which has been glossed over here, and people often don't think about is that AI as a tool is the first of its kind that is random-esque (yeah I know, non deterministic).

Imagine pressing G on a piano and sometimes getting a F, or A, depending on what the piano 'felt' like giving you.

So it definitely IS unlike a piano, brush, etc. What that means however in terms of skill/utility...I'm not too sure.

birn559 • 5 days ago

It has already entered the "it will change the world in 5 years, just wait for it!" stage just as crypto currency, quantum computing or cold fusion reactors.

From all technologies mentioned, I would argue AI has changed the world the most, though.

sandspar • 5 days ago

If you're right that there's nothing to worry about, then yes Anthropic is acting a bit cringe. And if you're wrong?