Item 44231812

khendron • 2 days ago

When I first tried an LLM agent, I was hoping for an interactive, 2-way, pair collaboration. Instead, what I got was a pairing partner who wanted to do everything themselves. I couldn't even tweak the code they had written, because it would mess up their context.

I want a pairing partner where I can write a little, they write a little, I write a little, they write a little. You know, an actual collaboration.

icedchai • 2 days ago

Have you tried recently? This hasn't been my experience. I modify the code it's written, then ask it to reread the file. It generally responds "I see you changed file and [something.]" Or when it makes a change, I tell it I need to run some tests. I provide feedback, explain the problem, and it iterates. This is with Zed and Claude Sonnet.

1 reply

dkersten • 2 days ago

I do notice though that if I edit what it wrote before accepting it, and then it sees it (either because I didn’t wait for it to finish or because I send it another message), it will overwrite my changes with what it had before my changes every single time, without fail.

(Zed with Claude 4)

2 replies

jagged-chisel • 2 days ago

Gemini has insisted on remembering an earlier version of a file even after its own edits.

“We removed that, remember?”

“Yes! I see now …”

Sometimes it circles back to that same detail that no longer exists.

icedchai • 2 days ago

Interesting. I always wait for it to finish with my workflow.

1 reply

dkersten • 2 days ago

It does it even if I wait for it to finish, but don't accept. Eg:

Starting code: a quick brown fox

prompt 1: "Capitalize the words"

AI: A Quick Brown Fox

I don't accept or reject, but change it to "A Quick Red Fox"

prompt 2: "Change it to dog"

AI: A Quick Brown Dog

1 reply

icedchai • 2 days ago

Do you tell it to reread the file? Seems like the updates aren't in the context.

1 reply

dkersten • 2 days ago

Hmm, perhaps not. I’ll have to experiment more.

Macha • 2 days ago

My approach has generally been to accept, refactor and reprompt if I need to tweak things.

Of course this does artificially inflate the "accept rate" which the AI companies use to claim that it's writing good code, rather than being a "sigh, I'll fix this myself" moment.

1 reply

searls • 2 days ago

I do this too and it drives me nuts. It's very obvious to me (and perhaps anyone without an incentive to maximize the accept rate) that the diff view really struggles. If you leave a large diff, copilot and cursor will both get confused and start duplicating chunks, or they'll fail to see the new (or the old) but if you accept it, it always works.

1 reply

jeffrallen • 2 days ago

Aider solves this by turn-taking. Each modification is a commit. If you hate it, you can undo it (type /undo, it does the git reset --hard for you). If you can live with the code but want to start tweaking it, do so, then /commit (it makes the commit message for you by reading the diffs you made). Working I turns, by commits, Aider can see what you changed and keep up with you. I usually squash the commits at the end, because the wandering way of correcting the AI is not really useful history.

psadri • 2 days ago

I usually add “discuss first. Don’t modify code yet”. Then we do some back and forth. And finally, “apply”.

3 replies

dragonfax • 2 days ago

Claude Code has "plan mode" for this now. It enforces this behavior. But its still poorly documented.

2 replies

psadri • 2 days ago

They should add a “cmd-enter” for ask, and “enter” to go.

Separately, if I were at cursor (or any other company for that matter), I’d have the AI scouring HN comments for “I wish x did y” suggestions.

1 reply

falcor84 • 2 days ago

I've been thinking about this a lot recently - having AI automate product manager user research. My thread of thought goes something like this:

0. AI can scour the web for user comments/complaints about our product and automatically synthesize those into insights.

1. AI research can be integrated directly into our product, allowing the user to complain to it just-in-time, whereby the AI would ask for clarification, analyze the user needs, and autonomously create/update an idea ticket on behalf of the user.

2. An AI integrated into the product could actually change the product UI/UX on its own in some cases, perform ad-hoc user research, asking the user "would it be better if things were like this?" and also measuring objective usability metrics (e.g. task completion time), and then use that validated insight to automatically spawn a PR for an A/B experiment.

3. Wait a minute - if the AI can change the interface on its own - do we even need to have a single interface for everyone? Perhaps future software would only expose an API and a collection of customizable UI widgets (perhaps coupled with official example interfaces), which each user's "user agent AI" would then continuously adapt to that user's needs?

1 reply

darkwater • 2 days ago

> 3. Wait a minute - if the AI can change the interface on its own - do we even need to have a single interface for everyone? Perhaps future software would only expose an API and a collection of customizable UI widgets (perhaps coupled with official example interfaces), which each user's "user agent AI" would then continuously adapt to that user's needs?

Nice, in theory. In practice it will be "Use our Premium Agent at 24.99$/month to get all the best features, or use the Basic Agent at 9.99$ that will be less effective, less customizable and inject ads".

1 reply

falcor84 • 2 days ago

Well, at the end of the day, capitalism is about competition, and I would hope for a future where that "User Agent AI" is a local model fully controlled by the user, and the competition is about which APIs you access through them - so maybe "24.99$/month to get all the best features", but (unless you relinquish control to MS or Google), users wouldn't be shown any ads unless they choose to receive them.

We're seeing something similar in VS Code and its zoo of forks - we're choosing which API/subscriptions to access (e.g. GitLens Pro, or Copilot, or Cursor/Windsurf/Trae etc.), but because the client itself is open source, there aren't any ads.

mdemare • 2 days ago

Claude Code denies that it has a plan mode...

lodovic • 2 days ago

I try to be super careful, type the prompt I want to execute in a textfile. Ask the agent to validate and improve on it, and ask it to add an implementation plan. I even let another agent review the final plan. But even then, occasionally it still starts implementing halfway a refining.

carpo • 2 days ago

Same. I use /ask in Aider so I can read what it's planning, ask follow-up questions, get it to change things, then after a few iterations I can type "Make it so" while sitting back to sip on my Earl Grey.

2 replies

tomkwong • 2 days ago

I had done something slightly different. I would ask LLM to prepare a design doc, not code, and iterate on that doc before I ask them to start coding. That seems to have worked a little better as it’s less likely to go rogue.

psadri • 1 day ago

Don’t forgot to realign the dilithium crystals once in a while.

mock-possum • 2 days ago

In all honesty - have you tried doing what you would do with a paired programmer - that is, talk to them about it? Communicate? I’ve never had trouble getting cursor or copilot to chat with me about solutions first before making changes, and usually they’ll notice if I make my own changes and say “oh, I see you already added XYZ, I’ll go ahead and move on to the next part.”

1 reply

lomase • 2 days ago

I’ve never had trouble getting cursor or copilot to chat with me about solutions first before making changes

Never had any trouble.... and then they lived together happy forever.

tobyhinloopen • 2 days ago

You can totally do that. Just tell it to.

If you want an LLM to do something, you have to explain it. Keep a few prompt docs around to load every conversation.

artursapek • 2 days ago

I do this all the time with Claude Code. I’ll accept its changes, make adjustments, then tell it what I did and point to the files or tell it to look at the diff.

Pair programming requires communicating both ways. A human would also lose context if you silently changed their stuff.

haneul • 2 days ago

Hmm you can tweak fine these days without messing up context. But, I run in “ask mode” only, with opus in claude code and o3 max in cursor. I specifically avoid agent mode because, like in the post, I feel like I gain less over time.

I infrequently tab complete. I type out 80-90% of what is suggested, with some modifications. It does help I can maintain 170 wpm indefinitely on the low-medium end.

Keeping up with the output isn’t much an issue at the moment given the limited typing speed of opus and o3 max. Having gained more familiarity with the workflow, the reading feels easier. Felt too fast at first for sure.

My hot take is that if GitHub copilot is your window into llms, you’re getting the motel experience.

1 reply

catlifeonmars • 2 days ago

> My hot take is that if GitHub copilot is your window into llms, you’re getting the motel experience.

I’ve long suspected this; I lean heavily on tab completion from copilot to speed up my coding. Unsurprisingly, it fails to read my mind a large portion of the time.

Thing is, mind reading tab completion is what I actually want in my tooling. It is easier for me to communicate via code rather than prose, and I find the experience of pausing and using natural language to be jarring and distracting.

Writing the code feels like a much more direct form of communicating my intent (in this case to the compiler/interpreter). Maybe I’m just weird; and to be honest I’m afraid to give up my “code first” communication style for programming.

Edit: I think the reason why I find the conversational approach so difficult is that I tend to think as I code. I have fairly strong ADHD and coding gives me appropriate amount of stimulation to do design work.

1 reply

maleldil • 2 days ago

Take a look at aider's watch mode. It seems like a bridge for code completion with more powerful models than Copilot.

https://aider.chat/docs/usage/watch.html

1 reply

catlifeonmars • 2 days ago

Thank you! I will check it out

pbhjpbhj • 2 days ago

I've asked for hints/snippets to give ideas and then implemented what I wanted myself (not commercially). Worked OK for me.