esafak 2 days ago

Are there any benchmarks that track historical performance?

2
behnamoh 2 days ago

good question, and I don't know of any, although it's a no brainer that someone should make it.

a proxy to that may be the anecdotal evidence of users who report back in a month that model X has gotten dumber (started with gpt-4 and keeps happening, esp. with Anthro and OpenAI models). I haven't heard such anecdotal stories about Gemini, R1, etc.

SparkyMcUnicorn 2 days ago

Aider has one, but it hasn't been updated in months. People kept claiming models were getting worse, but the results proved that they weren't.

__mharrison__ 2 days ago
vitaflo 2 days ago

That Deepseek price is always hilarious to see in these charts.

SparkyMcUnicorn 2 days ago

That's not the one I'm referring to. See my other comments or your sibling comment.