WaltPurvis 7 days ago

>Not to mention most of it basically died when most models started supporting +1M context.

Do most models support that much context? I don't think anything close to "most" models support 1M+ context. I'm only aware of Gemini, but I'd love to learn about others.

1
fkyoureadthedoc 7 days ago

GPT 4.1 / mini / nano

consumer451 7 days ago

As the context grows, all LLMs appear to turn into idiots, even just at 32k!

> We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts (<1K), performance degrades significantly as context length increases. At 32K, for instance, 10 models drop below 50% of their strong short-length baselines. Even GPT-4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99.3% to 69.7%.

https://news.ycombinator.com/item?id=44107536

rs186 7 days ago

This paper is slightly outdated by LLM model standards -- GPT 4.1 or Gemini 2.5 haven't been released at that time.

consumer451 7 days ago

Yes, I mentioned that in the comment in the linked post. I wish someone was running this methodology as an ongoing project, for new models.

Ideally, isn't this a metric that should be included on all model cards? It seems like a crucial metric.