jokethrowaway 5 days ago

Looking forward to try. My current go-to solution is E5-F2 (great cloning, decent delivery, ok audio quality, a lot of incoherence here and there forcing you to do multiple generations).

I've just been massively disappointed by Sesame's CSM: on their gradio on the website it was generating flawless dialogs with amazing voice cloning. When running it local the voice cloning performance is awful.

1
toebee 5 days ago

Thanks for the interest! We also enjoyed using E5-F2 :) You can try it now on HF Spaces: https://huggingface.co/spaces/nari-labs/Dia-1.6B