arnaudsm 2 days ago

I wished the charts included Qwen3, the current SOTA in reasoning.

Qwen3-4B almost beats Magistral-22B on the 4 available benchmarks, and Qwen3-30B-A3B is miles ahead.

4
SparkyMcUnicorn 2 days ago

30-A3B is a really impressive model.

I throw tasks at it running locally to save on API costs, and it's possibly better than anything we had a year or so ago from closed source providers. For programming tasks, I'd rank it higher than gpt-4o

freehorse 1 day ago

It is a great model, and blazing fast, which is actually very useful esp for "reasoning" models, as they produce a lot of tokens.

I wish mistral were back into making MoE models. I loved their 8x7 mixtral, it was one of the greatest models I could run the time it went out, but it is outdated now. I wish somebody was out making a similar size MoE model, which could comfortably sit in a 64GB ram macbook and be fast. Currently the qwen 30-A3B is the only one I know of, but it would be nice to have something slightly bigger/better (incl a non-reasoning base one). All the other MoE models are just too big to run locally in more standard hardware.

poorman 2 days ago

Is there a popular benchmark site people use? Becaues I had to test all these by hand and `Qwen3-30B-A3B` still seems like the best model I can run in that relative parameter space (/memory requirements).

arnaudsm 2 days ago

- https://livebench.ai/#/ + AIME + LiveCodeBench for reasoning

- MMLU-Pro for knowledge

- https://lmarena.ai/leaderboard for user preference

We only got Magistral's GPQA, AIME & livecodebench so far.

resource_waste 2 days ago

No surprise on my end. Mistral has been basically useless due to other models always being better.

But its European, so its a point of pride.

Relevance or not, we will keep hearing the name as a result.

devmor 2 days ago

I would agree, Qwen3 is definitely the most impressive "reasoning" model I've evaluated so far.