I usually use huge context/prompt documents (10-100K tokens) before doing anything, I suppose that helps.
I’ll experiment with comments, I can always delete them later. My strategy is to have self-documenting code (and my prompts include a how-to on self-documenting code)
But that information is scattered. It's helpful for the LLM to cluster and isolate local reasoning that it can then "forget" about when it moves on to the next thing. Attending to nearby recent tokens is easy for it, looking up relevant information as needle in a haystack every single time is more error prone. I'm not saying asking it to remove comments will lead to a catastrophic drop off in performance, maybe something like a few percent or even less. Just that it's not useless for pure benchmaxxing.