> Now cheaper than gpt-4o and same price as gpt-4.1 (!).
This is where the naming choices get confusing. "Should" o3 cost more or less than GPT-4.1? Which is more capable? A generation 3 of tech intuitively feels less advanced than a 4.1 of a (similar) tech.
Do we know parameter counts? The reasoning models have typically been cheaper per token, but use more tokens. Latency is annoying. I'll keep using gpt-4.1 for day-to-day.
o3 is a reasoning model, GPT-4.1 is not. They are orthogonal.
My quibble is with naming choices and differentiating. Even here they are confusing:
- o4 is reasoning
- 4o is not
They simply do not do a good job of differentiating. Unless you work directly in the field, it is likely not obvious what is the difference between "our most powerful reasoning model" and "our flagship model for complex tasks."
"Does my complex task need reasoning or not?" seems to be how one would choose. (What type of task is complex but does not require any reasoning?) This seems less than ideal!
This is true, and I believe apps automatically route requests to appropriate models for normie users.