I think that was a joke. New pricing is already in place:
Input: $2.00 / 1M tokens
Cached input: $0.50 / 1M tokens
Output: $8.00 / 1M tokens
https://openai.com/api/pricing/
Now cheaper than gpt-4o and same price as gpt-4.1 (!).
> Now cheaper than gpt-4o and same price as gpt-4.1 (!).
This is where the naming choices get confusing. "Should" o3 cost more or less than GPT-4.1? Which is more capable? A generation 3 of tech intuitively feels less advanced than a 4.1 of a (similar) tech.
Do we know parameter counts? The reasoning models have typically been cheaper per token, but use more tokens. Latency is annoying. I'll keep using gpt-4.1 for day-to-day.
o3 is a reasoning model, GPT-4.1 is not. They are orthogonal.
My quibble is with naming choices and differentiating. Even here they are confusing:
- o4 is reasoning
- 4o is not
They simply do not do a good job of differentiating. Unless you work directly in the field, it is likely not obvious what is the difference between "our most powerful reasoning model" and "our flagship model for complex tasks."
"Does my complex task need reasoning or not?" seems to be how one would choose. (What type of task is complex but does not require any reasoning?) This seems less than ideal!
This is true, and I believe apps automatically route requests to appropriate models for normie users.
No, people had tested it after Altman's announcement and had confirmed that they were still being billed at the original price. And I checked the docs ~1h after and they still showed the original price.
The speculation of only input pricing being lowered was because yesterday they gave out vouchers for 1M free input tokens while output tokens were still billed.
thinking models produce a lot of internal output tokens making them more expensive than non-reasoning models for similar prompt and visible output lengths