Item 44083763

martinald • 6 days ago

I think this is the biggest alignment problem with LLMs in the short term imo. It is getting scarily good at this.

I recently found a pretty serious security vulnerability in an open source very niche server I sometimes use. This took virtually no effort using LLMs. I'm worried that there is a huge long tail of software out there which wasn't worth finding vulnerabilities in for nefarious means manually but if it was automated could lead to really serious problems.

tekacs • 6 days ago

The (obvious) flipside of this coin is that it allows us to run this adversarially against our own codebases, catching bugs that could otherwise have been found by a researcher, but that we can instead patch proactively.\

I wouldn't (personally) call it an alignment issue, as such.

1 reply

tekacs • 2 days ago

A few days later, case in point (I'm in no way affiliated): https://news.ycombinator.com/item?id=44117465

Legend2440 • 6 days ago

If attackers can automatically scan code for vulnerabilities, so can defenders. You could make it part of your commit approval process or scan every build or something.

1 reply

martinald • 5 days ago

A lot of this code isn't updated though. Think of how many abandoned wordpress plugins there are (for example). So the defenders could, but how do they get that code to fix it?

I agree after time you end up with a steady state but in the short medium term the attackers have a huge advantage.

roywiggins • 5 days ago

Is it an alignment problem if it's doing what was asked of it? It's "aligned" with a human's wishes.

bongodongobob • 5 days ago

It's a moot point unless attackers have better LLMs don't have access to.