I just happen to have a 3 hours recording that needs transcription and I didn't manage with Whisper. It has 3 special characteristics:
-Huge size (400MB), it can be split but then I want a single text file with correct timestamps
- There are 3 speakers and one is speaking far from the microphone and with low voice. Whisper sometimes ignores this speaker.
- The last and more difficult is that there are 2 languages being used at the same time. The same speaker might use Dutch or English and even mix both in a sentence.
Is there a way to deal with all that?
Whisper 3 Large should be able to handle multiple languages in the same audio. Have you used that?