A very worthwhile mention is also Stable-TS: https://github.com/jianfch/stable-ts
Out of the box it can transcribe with Whisper or Faster-Whisper, but it can also align audio with an existing human-written transcript, providing time information without losing accuracy. This last feature was something I really needed, and my attempt at building it myself ended up much worse, so I'm glad I found this
I self-host it using Modal.com, as do some other commenters