iandanforth 6 days ago

The most interesting and significant bit of this article for me was that the author ran this search for vulnerabilities 100 times for each of the models. That's significantly more computation than I've historically been willing to expend on most of the problems that I try with large language models, but maybe I should let the models go brrrrr!

4
seanheelan 5 days ago

I realised I didn't mention it in the article, so in case you're curious it cost about $116 to run the 100k token version 100 times.

egorfine 4 days ago

So, half that for batch processing [1], which presumably would be just fine for this task?

[1] https://platform.openai.com/docs/guides/batch

wyldfire 5 days ago

How many years/generations behind o3 are the freely available / local models?

ramy_d 5 days ago

thank you, I was going to ask about this. It's not a crazy amount...

Aachen 5 days ago

Do we know how that relates to actual operating cost? My understanding is that this is below cost price because we're still in the investor hype part of the cycle where they're trying to capture market share by pumping many millions into these companies and projects

Does this really reflect the resource cost of finding this vulnerability?

remram 4 days ago

It sounds like a crazy amount to me. I can run code analyzers/sanitizers/fuzzers on every commit to my repo at virtually no cost. Would they have caught a problem like this? Maybe not, certainly not without some amount of false positives. Still this LLM approach costs many millions of times more than previous tooling, and might still have brought up nothing (we just don't read the blog posts about those attempts).

JFingleton 6 days ago

Zero days can go for $$$, or you can go down the bug bounty route and also get $$. The cost of the LLM would be a drop in the bucket.

When the cost of inference gets near zero, I have no idea what the world of cyber security will look like, but it's going to be a very different space from today.

yencabulator 5 days ago

Except in this case the LLM was pointed at a known-to-exist vulnerability. $116 per handler per vulnerability type, unknown how many vulnerabilities exist.

GaggiX 2 days ago

The o3 discovered a new zero day exploit, it wasn't known previously, it's not the same one found by the author.

roncesvalles 6 days ago

A lot of money is all you need~

bbarnett 6 days ago

A lot of burned coal, is what.

The "don't blame the victim" trope is valid in many contexts. This one application might be "hackers are attacking vital infrastructure, so we need to fund vulnerabilities first". And hackers use AI now, likely hacked into and for free, to discover vulnerabilities. So we must use AI!

Therefore, the hackers are contributing to global warming. We, dear reader, are innocent.

sdoering 6 days ago

So basically running a microwave for about 800 seconds, or a bit more than 13 minutes per model?

Oh my god - the world is gonna end. Too bad, we panicked because of exaggerated energy consumption numbers for using an LLM when doing individual work.

Yes - when a lot of people do a lot of prompting, these 0ne tenth of a second to 8 seconds of running the microwave per prompt adds up. But I strongly suggest, that we could all drop our energy consumption significantly using other means, instead of blaming the blog post's author about his energy consumption.

The "lot of burned coal" is probably not that much in this blog post's case given that 1 kWh is about 0.12 kg coal equivalent (and yes, I know that we need to burn more than that for 1kWh. Still not that much, compared to quite a few other human activities.

If you want to read up on it, James O'Donnell and Casey Crownhart try to pull together a detailed account of AI energy usage for MIT Technology Review.[1] I found that quite enlightening.

[1]: https://www.technologyreview.com/2025/05/20/1116327/ai-energ...

XorNot 6 days ago

The better answer is just "I don't care".

Because I definitely don't care. Energy expenditure numbers are always used in isolation, lest any one have to deal with anything real about them, and always are content to ignore the abstraction which electricity is - namely, electricity is not coal. It's electricity. Unlike say, driving my petrol powered car, the power for my computers might come from solar panels, coal, nuclear power stations, geothermal power hydro...

Which is to say, if people want to worry about electricity usage: go worry about it by either building more clean energy, or campaigning to raise electricity prices.

sdoering 5 days ago

Funny, I actually care. But I try to direct my care towards the real culprits.

So about 50% of CO2 emissions in Germany come from 20 sources. The campaigns like personal footprint (invented by BP) are there to shift the blame to consumers. Away from those with the biggest impact and the most options for action.

So yes, I f**ng don’t care if a security researcher leaves his microwave equivalent running for a few minutes. But I care, campaign in the bigger sense and also orient my own consumption wherever possible towards cleaner options.

Full well knowing that even as mostly being reasonable in my consumption, I definitely belong to those 5-10% of earth's population who drive the problem. Because more than half of the population in the so called first world live according to the Paris Climate Agreement. And it’s not the upper half of.

Balooga 6 days ago

Between $3k and $30k to solve a single ARC-AGI problem [1]. Not sure if "100 runs" makes this comparable.

[1] https://techcrunch.com/2025/04/02/openais-o3-model-might-be-...

mcbuilder 5 days ago

I think it gave up trying to solve Pokemon. :) Seriously, aren't these ARC-AGI problems easy for most people? They usually involve some sort of pattern recognition and visual reasoning.

umbra07 6 days ago

And how do you know what the purely-human-driven energy expenditure would have been?

wongarsu 6 days ago

How much longer would OP have needed to find the same vulnerability without LLM help? Then multiply that by the energy used to produce 2000kcal/day of food as well as the electricity for running their computer.

Usually LLMs come out far ahead in those types of calculations. Compared to humans they are quite energy efficient

topaz0 5 days ago

Those types of calculation are extremely disingenuous.

sadeshmukh 5 days ago

What exactly is disingenuous about it?

topaz0 5 days ago

It reduces the value of a human life to the incremental rate at which they produce some concrete product. It is absurd.

sadeshmukh 4 days ago

Or, it elevates the tasks artificial intelligence produces to the actual difficulty of them - the human effort.

topaz0 3 days ago

You're not thinking this through. Your human life (with its associated 2000 Cal/day) does so much more than find bugs in obscure codebases. Or at least, one would hope.

xyst 6 days ago

"100 times for each of the models" represents a significant amount of energy burned. The achievement of finding the most common vulnerability in C based codebases becomes less of an achievement. And more of a celebration of decadence and waste.

We are facing global climate change event, yet continue to burn resources for trivial shit like it’s 1950.