Has anyone noticed that OpenAI has become "lazy"? When I ask questions now it will not give me a complete file or fix. Instead it tells me what I should do and I need to ask a second or third time to just do the thing I asked.
I don't see this happening with for example deepseek.
Is it possible they are saving on resources by having it answer that way?
Yeah, our models are sometimes too lazy. It’s not intentional, and future models will be less lazy.
When I worked at Netflix I sometimes heard the same speculation about intentionally bad recommendations, which people theorized would lower streaming and increase profit margins. It made even less sense there as streaming costs are usually less than a penny. In reality, it’s just hard to make perfect products!
(I work at OpenAI.)
Please be careful about the alternative. I’ve seen o3 doing excessive tool calls and research for relatively simple problems.
Yep, it defaults to doing a web search even when that doesn't make sense.
Example, I asked it to write something. And then I asked it to give me that blob of text in markdown format. So everything it needed was already in the conversation. That took a whole minute of doing web searches and what not.
I actually dislike using o3 for this reason. I keep the default to 4o. But sometimes I forget to switch back and it goes off boiling the oceans to answer a simple question. It's a bit too trigger happy with that. In general all this version and model soup is impossible to figure out for non technical users. And I noticed 4o is now sometimes starting to do the same. I guess, too many users never use the model drop down.
After the last few weeks, where o3 seems desperate to do tool searches or re-crunch a bad gen even though I only asked a question about it, I assumed that the policy is to burn through credits at the fastest possible rate. With this price change, I don't know what's happening now...
Are they actually profitable? A policy to burn through credits only makes sense if they're making a profit on each token - otherwise it would be counterproductive.
That was a problem in GPT 4 Turbo as well...
IMO its just that the models are very nondeterministic, and people get very different kinds of responses from it. I met a number of people who tried it when it first came out and it was just useless so they stopped trying it, other people (including me) got gobsmacking great responses and it felt like AGI was around the corner, but after enough coin flips your luck runs out and you get some lazy responses. Some people have more luck than others and wonder why everyone around them says it's trash.
GPT4-Turbo had some major "laziness" problems, like really major ones. I posted about this a year back.https://news.ycombinator.com/item?id=39985596#39987726
I am not saying they haven't improved the laziness problem, but it does happen anecdotally. I even got similar sort of "lazy" responses for something I am building with gemini-2.5-flash.
I think it's good. The model will probably make some mistake at first. Not doing the whole thing and just telling the user the direction it's going in gives us a chance to correct its mistakes.
but maybe you are saying that because you are a CIA plant that's trying to make the product bad because of complex reasons.
takes tinfoil hat off
Oh, nvm, that makes sense.
Can you share what are the main challenges OpenAI has been facing in terms of increasing access to top-tier and non-lazy models?
Had a fun experience the other day asking "make a graph of [X] vs [Y]" (some chemistry calculations), and the response was blah blah blah explain explain "let me know if you want a graph of this!" Yeah ok thanks for offering.