jokethrowaway 6 days ago

You can build a pipeline where you use: GroundingDino (description to object detection) -> SAM (segmenting) -> Stable Diffusion model (inpainting, I do mainly real photo so I like to start with realisticVisionV60B1_v51HyperVAE-inpainting and then swap if I have some special use case)

For higher quality at a higher cost of VRAM, you can also use Flux.1 Fill to do inpainting.

Lastly, Flux.1 Kontext [dev] is going to be released soon and it promises to replace the entire flow (and with better prompt understanding). HN thread here: https://news.ycombinator.com/item?id=44128322

1
silentsea90 15 hours ago

Thanks! I do use GroundingDino + SAM2, but haven't tried realisticVisionV60B1_v51HyperVAE-inpainting! Will do! And will try flux kontext too. Thanks!