Item 43999300

Jarwain • 1 day ago

Aider's benchmarks show 4.1 (and 4o) work better in its architect mode, for planning the changes, and o3 for making the actual edits

You have that backwards. The leaderboard results have the thinking model as the architect.

In this case, o3 is the architect and 4.1 is the editor.

drewnick • 1 day ago

I see o3 (high) + gpt-4.1 at 82.7% -- the highest on the benchmark currently.