My quibble is with naming choices and differentiating. Even here they are confusing:
- o4 is reasoning
- 4o is not
They simply do not do a good job of differentiating. Unless you work directly in the field, it is likely not obvious what is the difference between "our most powerful reasoning model" and "our flagship model for complex tasks."
"Does my complex task need reasoning or not?" seems to be how one would choose. (What type of task is complex but does not require any reasoning?) This seems less than ideal!
This is true, and I believe apps automatically route requests to appropriate models for normie users.