Item 44244846

fastball • 2 days ago

Anecdotal, but about a week ago I noticed a sharp drop in o3 performance. For many tasks I will compare Gemini 2.5 Pro with o3, running the same prompt in both. Generally for my personal use o3 and G2.5P have been neck-and neck over the last months, with responses I have been very happy with.

However starting from a week ago, the o3 responses became noticeably worse, with G2.5P staying about the same (in terms of what I've come to expect from the two models).

This alongside the news that you guys have decreased the price of o3 by 80% does really make it feel like you've quantized the model or knee-capped thinking or something. If you say it is wholly unchanged I'll believe you, but not sure how else to explain the (admittedly subjective) performance drop I've experienced.

IanCal • 2 days ago

Are you sure you're using the same models? G2.5P updated almost exactly a week ago.

1 reply

fastball • 1 day ago

G2.5P might've updated, but that's not the model I noticed a difference. o3 seemed noticeably dumber in isolation, not just compared to G2.5P.

But yes, perhaps the answer is that about a week ago I started asking subconsciously harder questions, and G2.5P handled them better because it had just been improved, while o3 had not so it seemed worse. Or perhaps G2.5P has always had more capacity than o3, and I wasn't asking hard enough questions to notice a difference before.