YOLO is way faster. We used to run both, with YOLO finding candidate bounding boxes and SAM segmenting just those.
For what it's worth, YOLO has been a standard in image processing for ages at this point, with dozens of variations on the algorithm (yolov3, yolov5, yolov6, etc) and this is yet another new one. Looks great tho
SAM wouldn't run under 1000ms per frame for most reasonable image sizes
Just as a quick demo, here is an example of YOLO-World combined with EfficientSAM: https://youtu.be/X7gKBGVz4vs?t=980
We used mobile Sam because of this, was about 250ms on cpu. Useful for our use case