ipsum2 6 days ago

SAM doesn't do open vocabulary i.e. it segments things without knowing the name of the object, so you can't ask it to do "highlight the grapes", you have to give it an example of a grape first.

1
ipsum2 6 days ago

This uses GroundingDINO for open vocabulary, separate model. Useful nonetheless, but means you're running a lot of model inference for a single image.