minimaxir 8 days ago

Since this was oriented toward a Python audience, it may have also been useful to demonstrate on the poster how in Python you can create the embeddings (e.g. using requests/OpenAI client and hitting OpenAI's embeddings API) and calculate the similarities (e.g. using numpy) since most won't read the linked notebooks. Mostly as a good excuse to showoff Python's rare @ operator for dot products in numpy.

As a tangent, what root data source are you using to calculate the movie embeddings?

1
pamelafox 8 days ago

I thought I'd make this blog post be language-agnostic, but agreed that a Python-specific version would be helpful.

Here's where I calculate cosine without numpy: https://github.com/pamelafox/vector-embeddings-demos/blob/ma...

And in the distance notebook, I calculate with numpy: https://github.com/pamelafox/vector-embeddings-demos/blob/ma... I didn't use the @ operator! TIL.

I forget where I originally got the Disney movie titles, but it is notably just the titles. A better ranking would be based off a movie synopsis as well. Here's where I calculated their embeddings using OpenAI: https://github.com/pamelafox/vector-embeddings-demos/blob/ma...

Maybe I can submit a poster to Pytorch that would include the Python code as well.