Would be interesting to see a comparison with Qwen 32B. I found it a fantastic local model (ollama).
Last year, fit was important. This year, inference speed is key.
Proofreading an email at four tokens per second, great.
Spending a half hour to deep research some topic with artifacts and MCP tools and reasoning at four tokens per second… a bad time.