have you thought about SPALDE models? ex: https://arxiv.org/abs/2109.10086
Looks really interesting, I'll have a proper read. What would be your reasoning to incorporate this if we already have vector functionality and semantic search?
my project deals w/ non-english text, bm25 performance is middeling. Language specific sparse model helps.
We will definitely look into it. The SPLADE models look promising!