Is there a popular benchmark site people use? Becaues I had to test all these by hand and `Qwen3-30B-A3B` still seems like the best model I can run in that relative parameter space (/memory requirements).
- https://livebench.ai/#/ + AIME + LiveCodeBench for reasoning
- MMLU-Pro for knowledge
- https://lmarena.ai/leaderboard for user preference
We only got Magistral's GPQA, AIME & livecodebench so far.