Because it’s a power tool, not a beginner tool. The beginner tool is notepad and nano, where instructions are spelled out or follow some conventions.
Expert tools are for the problems you face even after learning the in and outs of the beginner tools. Vim solve them by smashing convention in order to build a system of composable elements. Emacs solve it by building a vm and a user interface where everything in between is customizable.
Here is an article by Bram Moolenar that explore the problem space of a good text editor.
This explains nothing since it's applicable for any set of keybinds, so if you had to type "word back" in normal mode to move by word or "PRETTY PLEASE LET ME OUT" to exit, you could still say the irrelevant "but it's a power tool smashing convention!"
Yep, but vim’s keybinds are composable and with less cognitive load than the conventional way. That’s the selling point. Not knowing how to use it isn’t a good argument for not using it. Riding a motorcycle isn’t natural, but the speed improvement is real.
Cognitive load is somewhat subjective. The composability advantage is what's hard to wrap your head around if you've never developed the skill. Vim / Helix / Kakoune are fundamentally more powerful keybinding systems due to their composability ("grammar"). Learning Vim early on in my career is easily one of the greatest skill investments I've ever made. Every minute saved doing small edits adds up over years, and I've pulled off many large-scale refactors in literally 1% of the time it takes other people with non-composable keybindings.
I imaging a lot of people will increasingly lean on AI to handle most editing tasks, but until people literally stop using keyboards, Vim will still be worth learning.