This thing is crazy fast.
They have a deal with Cerebras for inference.
For me this is more important than quality. I love fast responses, feels more futuristic.
What are you using LLMs for?
Mostly "How is $term in English", what is $thing, review this message for clarity, clean up this data, parse this screenshot, and coding.
I see! So for these, you tend to find the accuracy "good enough" on the faster-but-less-accurate models.
I generally find the same thing for simple definitions/translations and other "chat" tasks. I'm a little bit surprised that you also find it so for coding, but otherwise I think I get it.