It's probably optimized in some way, but if the optimizations degrade performance, let's hope it is reflected in various benchmarks. One alternative hypothesis is that it's the same model, but in the early days they make it think "harder" and run a meta-process to collect training data for reinforcement learning for use on future models.
It's a bit dated now, but it would be cool if people submitted PRs for this one: https://aider.chat/docs/leaderboards/by-release-date.html
Dated? This was updated yesterday https://aider.chat/docs/leaderboards/
My link is to the benchmark results _over time_.
The main leaderboard page that you linked to is updated quite frequently, but it doesn't contain multiple benchmarks for the same exact model.