LLM agents don't know how to shut up and always think they're right about everything. They also lack the ability to be brief. Sometimes things can be solved with a single character or line, but no they write a full page. And, they write paragraphs of comments for even the most minuscule of changes.
They talk at you, are overbearing and arrogant.
I expect a lot of the things people don't like ("output too long, too many comments in code") are side effects of making the LLM good in other areas.
Long output correlates with less laziness when writing code, and higher performance on benchmarks due to the monotone relationship between number of output tokens and scores. Comment spam correlates with better performance because it's locally-specific reasoning it can attend on when writing the next line of code, leading to reduced errors.
Just add to the prompt not to include comments and to talk less.
I have a prompt document that includes a complete summary of the Clean Code book, which includes the rules about comments.
You do have to remind it occasionally.
You can, but I would expect code correctness to be reduced, you're removing one mechanism the model uses to dump local reasoning immediately prior to where it's needed.
With that logic, I should ask the AI to _increase_ the amount of comments. I highly doubt the comments it generates are useful, they're usually very superficial.
Perhaps not useful to you, but they are the only way the LLM has to know what it is doing.
It has to reason about the problem in its output, since its output comprises almost the entirety of its "awareness". Unlike you, the LLM doesn't "know" anything, even superficial things.
In some sense it's like us when we are working on a problem with lots of novel parts. We usually have to write down notes to refer to in the process of solving the problem, except for the LLM the problem is always a novel problem.
I usually use huge context/prompt documents (10-100K tokens) before doing anything, I suppose that helps.
I’ll experiment with comments, I can always delete them later. My strategy is to have self-documenting code (and my prompts include a how-to on self-documenting code)
But that information is scattered. It's helpful for the LLM to cluster and isolate local reasoning that it can then "forget" about when it moves on to the next thing. Attending to nearby recent tokens is easy for it, looking up relevant information as needle in a haystack every single time is more error prone. I'm not saying asking it to remove comments will lead to a catastrophic drop off in performance, maybe something like a few percent or even less. Just that it's not useless for pure benchmaxxing.
I have added it in the guidelines doc for Junie and that won't stop it. It can't help itself - it needs to write a comment every three lines, no matter the language it's writing in.
I was trying out sonnet 4 yesterday and it spent 15 minutes changing testing changing etc just to get one config item changed. It ended up changing 40 files for no reason. Also kept trying to open a debugger that didn’t exist and load a webpage that requires auth.
They’re far from perfect that’s for sure.
I don’t think anyone seriously is claiming perfect. The thing is all of AI is moving 5 times faster than any disrupting tech before it.
We went from proof reading single emails to researching agentic coding in a year.
It should have been five.