semi-extrinsic 1 day ago

Would also be interesting to compare with R1-0528-Qwen3-8B (chain-of-thought distilled from Deepseek-R1-0528 and post-trained into Qwen3-8B). It scores 86 and 76 on AIME 2024 and 2025 respectively.

Currently running the 6-bit XL quant on a single old RTX 2080 Ti and I'm quite impressed TBH. Simply wild for a sub-8GB download.

2
saratogacx 1 day ago

I have the same card on my machine at home, what is your config to run the model?

semi-extrinsic 1 day ago

Downloaded the gguf file by unsloth, ran llama-cli from llama.cpp with that file as an argument.

IIUC, nowadays there is a jinja templated metadata-struct inside the gguf file itself. This contains the chat template and other config.

danielhanchen 1 day ago

I'm surprised it does very well as well - that's pretty cool to see!