You trust their PR statements?
It's not a PR statement, it's a change in price. Literally putting money where the mouth is.
Or they are trying to gobble up market share because Anthropic has been much better than OpenAI
Providers are exceptionally easy to switch. There's no moat for enterprise-level usage. There's no "market share" to gobble up because I can change a line in my config, run the eval suite, and switch immediately to another provider.
This is marginally less true for embedding models and things you've fine-tuned, but only marginally.
o3 probably used to have a HUGE profit margin on inference, so I'd say it's unclear how much optimo was done;
I find it pretty plausible they got an 80% speedup just by making optimized kernels for everything. Even when GPUs say they're being 100% utilized, there are so many improvements to be made, like:
- Carefully interleaving shared memory loading with computation, and the whole kernel with global memory loading.
- Warp shuffling for softmax.
- Avoiding memory access conflicts in matrix multiplication.
I'm sure the guys at ClosedAI have many more optimizations they've implemented ;). They're probably eventually going to design their own chips or use photonic chips for lower energy costs, but there's still a lot of gains to be made in the software.
yes I agree that it is very plausible. But it's just unclear whether it is more of a business decision or a real downstream effect of engineering optimizations (which I assume are happening everyday at OA)
Seems more likely to me then them deciding to take a sizable loss on inference by dropping prices by 80% for no reason.
Optimizing serving isn't unlikely: all of the big AI vendors keep finding new efficiencies, it's been an ongoing trend over the past two years.
This is my sense as well. You dont drop 80% on a random Tuesday based on scale, you do it with an explicit goal to get market share at the expense of $$.