Item 44243223

beering • 2 days ago

The o3-preview test was with very expensive amounts of compute, right? I remember it was north of $10k so makes sense it did better

Bjorkbat • 2 days ago

Point remains though, they crushed the benchmark using a specialized model that you’ll probably never have access to, whether personally or through a company.

They inflated expectations and then released to the public a model that underperforms

1 reply

throwaway314155 • 2 days ago

They revealed the price points for running those evaluations. IIRC the "high" level of reasoning cost tens of thousands of dollars if not more. I don't think they really inflated expectations. In fact a lot of what we learned is that ARC-AGI probably isn't a very good AGI evaluation (it claims to not be one, but the name suggests otherwise).