Item 44083281

manmal • 6 days ago

If the LLM wrote a harness and proof of concept tests for its leads, then it might increase S/N dramatically. It’s just quite expensive to do all that right now.

threeseed • 6 days ago

Except that in my experience half the time it will modify the implementation in order to make the tests pass.

And it will do this no matter how many prompts you try or you forcefully you ask it.

1 reply

moyix • 6 days ago

With security vulnerabilities, you don't give the agent the ability to modify the potentially vulnerable software, naturally. Instead you make them do what an attacker would have to do: come up with an input that, when sent to the unmodified program, triggers the vulnerability.

How do you know if it triggered the vulnerability? Luckily for low-level memory safety issues like the ones Sean (and o3) found we have very good oracles for detecting memory safety, like KASAN, so you can basically just let the agent throw inputs at ksmbd until you see something that looks kind of like this: https://groups.google.com/g/syzkaller/c/TzmTYZVXk_Q/m/Tzh7SN...

klabb3 • 5 days ago

> If the LLM wrote a harness and proof of concept tests for its leads, then it might increase S/N dramatically.

Designing and building meaningfully testable non-trivial software is orders of magnitude more complex than writing the business logic itself. And that’s if you compare writing greenfield code from scratch. Making an old legacy code base testable in a way conducive to finding security vulns is not something you just throw together. You can be lucky with standard tooling like sanitizers and valgrind but it’s far from a panacea.