Item 44147122

Try this: https://github.com/luca-medeiros/lang-segment-anything

ipsum2 • 6 days ago

This uses GroundingDINO for open vocabulary, separate model. Useful nonetheless, but means you're running a lot of model inference for a single image.