Item 43797678

gojomo • 1 day ago

Some might prefer the fidelity of this method's 70% savings over the lossyness of 4-bit quantization's 75%.

And, maybe the methods stack for those willing to trade both costs for the smallest representation.

svachalek • 1 day ago

This is only a 30% savings, which is a cool technical feat but hard to see a use case for.