kaycebasques 8 days ago

The post title reminds me of something that I researched a little a couple months back. Practically all embeddings are implemented as vectors, right? Definitionally, an embedding doesn't have to be a vector. But in practice there's not really any such thing as a non-vector embedding, is there?

One thing I learned recently is that, if your embedding model supports task types (clustering, STS, retrieval, etc.), then that can have a non-trivial impact on the generated embedding for a given text: https://technicalwriting.dev/ml/embeddings/tasks/index.html

Parquet and Polars sound very promising for reducing embeddings storage requirements. Still haven't tinkered with them: https://minimaxir.com/2025/02/embeddings-parquet/

And this post gave me a lot more awareness to be more careful about how exactly I'm comparing embeddings. OP's post seems to do a good job explaining common techniques, too. https://p.migdal.pl/blog/2025/01/dont-use-cosine-similarity/

1
seanhunter 8 days ago

There are lots of non-vector embeddings. “Embedding” just means one mathematical structure stuck inside another in such a way as to preserve its algebraic properties. So for example the real number line is embedded into the Cartesian plane along both axes. That’s a non-vector embedding. I think that field extensions (like the rationals in the reals) also count as an embedding although I may be wrong about that.

In ML, embeddings are used as a way to work with text using the tools of linear algebra. So of course people use (and want to use) vector embeddings, because that’s what gives you what you need in order to “do linear algebra on text”.

kaycebasques 7 days ago

Thank you for the details. I was thinking about ML embeddings in particular, but your answer does help me understand the core mathematical nature of an embedding at a deeper level. E.g. the word "embedding" makes more sense now in the context of "one mathematical structure stuck inside another."