gizmodo59 15 days ago

Having tried both I’d say o3 is in a league of it’s own compared to 3.7 or even Gemini 2.5 pro. The benchmarks may show not a lot of gain but that matters a lot when the task is very complex. What’s surprising is that they announced it last November and only now it’s released a month back now? (I’m guessing lots of safety took time but no idea). Can’t wait for o4!

1
dieortin 14 days ago

All your content threads from the past months consist on you saying how much better OpenAI products are than the competition, so that doesn’t inspire a ton of trust.

gizmodo59 14 days ago

Because in my use cases they are? Coding and math, science research are my primary use cases and codex with o3 and o3 consistently outperforms others in complex tasks for me. I can’t say a model is better just to appeal to HN. If another model is as good as o3 id use that in a second.

sothatsit 14 days ago

I also feel similarly. o3 feels quite distinct in what it is good at compared to other models.

For example, I think 2.5 Pro and Claude 4 are probably better at programming. But, for debugging, or not-super-well-defined reasoning tasks, or even just as a better search, o3 is in a league of its own. It feels like it can do a wider breadth of tasks than other models.