Unless you are encoding very, very small snippets of audio, you are almost certainly better off writing your encoding in a best-of-breed language and shelling out to that program. NIFs aren't for "things that Erlang is slow at", it's for "things that Erlang is slow at that you need in the same process". You don't need your audio encoding to be in the same process, and you really don't want to try to write audio encoding to the restrictions of NIFs.
I ended up having some trouble getting the NIF stuff working so I ended up using ports with Rust. The amount of overhead for something like that is basically nothing, and you are right, I do not need this to be in-process.