I know it’s taboo to ask, but I must: where’s the dataset from? Very eager to play around with audio models myself, but I find existing datasets limiting
Why would that be a taboo question to ask? It should be the question we always ask, when presented with a model and in some cases we should probably reject the model, based on that information.
Because generally the person asking this question is trying to cancel the model maker
or by replying you expose yourself to handing -proof- of the origins of the training data set to the copyright owner wanting to sue you next
Well presumably since they're individuals and not a business the consequences are much less severe legally - but public opinion still won't be great, but since when was it ever, for any new thing?
If I cut up a song or TV show & put it on Youtube (and screech about fair use/parody law) then that's fine, but people will balk at something like this.
AI is here, people.
No. It's for giving credit where credit is due. And yes, that includes the question if the people who generated the training data in the first place have given their consent that this can be used for AI training.
It's quite concerning that the community around here is usually livid about FOSS license violations, which typically use copyright law as leverage, but somehow is perfectly OK with training models on copyrighted work and just labels that as "fair use".
What AI tools have you used recently? Have you verified if they all use models trained on copyrighted material with permission?
Ah, that's a classic. "How can you criticize Big Oil and at the same time drive a car!" and voila, the case is closed.
I am allowed to criticize things without having to live like a hermit. I make moderate use of ChatGPT, yet at the same time I think that its training does not fall under fair use, and that creators should get compensated. If OpenAI's business model does not allow for this, then it should fail, and that's fine by me. I lived without ChatGPT, and I can live without it again.
I suspect podcasts, as you have a huge amount of transcribed data with good diction and mic quality. The voices sound like podcast voices to me.