ed 6 days ago

Neat. Wonder how this compares to Segment Anything (SAM), which also does zero-shot segmentation and performs pretty well in my experience.

2
RugnirViking 6 days ago

YOLO is way faster. We used to run both, with YOLO finding candidate bounding boxes and SAM segmenting just those.

For what it's worth, YOLO has been a standard in image processing for ages at this point, with dozens of variations on the algorithm (yolov3, yolov5, yolov6, etc) and this is yet another new one. Looks great tho

SAM wouldn't run under 1000ms per frame for most reasonable image sizes

euazOn 6 days ago

Just as a quick demo, here is an example of YOLO-World combined with EfficientSAM: https://youtu.be/X7gKBGVz4vs?t=980

aunty_helen 6 days ago

We used mobile Sam because of this, was about 250ms on cpu. Useful for our use case

ipsum2 6 days ago

SAM doesn't do open vocabulary i.e. it segments things without knowing the name of the object, so you can't ask it to do "highlight the grapes", you have to give it an example of a grape first.

ipsum2 6 days ago

This uses GroundingDINO for open vocabulary, separate model. Useful nonetheless, but means you're running a lot of model inference for a single image.