I'm just unconvinced by a fundamental aspect of it:
> Our analysis focused on self-reported time savings, where developers estimated the number of hours per week they saved through the use of AI coding assistants.
You could argue that it's "old" in "AI terms" (I wouldn't agree); you can also argue with the exact findings, but one thing seemed to be pretty clear from the 2025 METR study on AI in OSS: humans are particularly bad and unreliable when self-reporting productivity gains.
So, why are you still reporting that as a reliable indicator for the study?
Also, this conclusion
> This suggests that the plateau in time-savings may not be a failure of the tool or the user, but a sign that the bottleneck has shifted from individual code production to team-level coordination.
What evidence is there that the bottleneck has indeed shifted, meaning that there has ever been "individual code production"? Has the bottleneck really shifted, or was it always team-level coordination (and decision-making, and figuring out what's the right thing to do, and dealing with changing priorities, etc.)?
There is an expressed (implicit) belief that might be the case, but what's the evidence to support this statement?
While I’m a fan of the METR study, I’d be careful about drawing overly broad conclusions from a study of 16 developers. I think the more defensible takeaway is that self-reports can diverge meaningfully from externally measured outcomes in some contexts, not that self-reports are universally without value.
Other studies, like this one from Moritz Beller, suggest developers are at least directionally consistent in self-reports:
So I don’t agree with the premise that developer perceptions are inherently unreliable or uninformative. I’m a strong believer in mixed-methods research, and I think there’s substantial value in triangulating across telemetry, experiments, qualitative insights, and self-reports.
On the bottleneck point: I largely agree with you. I don’t think coding was ever the only bottleneck, and AI may simply be shinning a spotlight on constraints that were always there.
My own research has found that reducing collaboration friction can have a larger impact than AI adoption itself, and in another study, “Lack of clear business objectives” was the #4 most frequently cited workplace challenge among developers.
I completely agree that the fundamentals of good teamwork, coordination, and decision-making have always mattered enormously, and may matter even more moving forward.
Interesting article, as usual.
I'm just unconvinced by a fundamental aspect of it:
> Our analysis focused on self-reported time savings, where developers estimated the number of hours per week they saved through the use of AI coding assistants.
You could argue that it's "old" in "AI terms" (I wouldn't agree); you can also argue with the exact findings, but one thing seemed to be pretty clear from the 2025 METR study on AI in OSS: humans are particularly bad and unreliable when self-reporting productivity gains.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
So, why are you still reporting that as a reliable indicator for the study?
Also, this conclusion
> This suggests that the plateau in time-savings may not be a failure of the tool or the user, but a sign that the bottleneck has shifted from individual code production to team-level coordination.
What evidence is there that the bottleneck has indeed shifted, meaning that there has ever been "individual code production"? Has the bottleneck really shifted, or was it always team-level coordination (and decision-making, and figuring out what's the right thing to do, and dealing with changing priorities, etc.)?
There is an expressed (implicit) belief that might be the case, but what's the evidence to support this statement?
Thanks for the response, Sergio!
While I’m a fan of the METR study, I’d be careful about drawing overly broad conclusions from a study of 16 developers. I think the more defensible takeaway is that self-reports can diverge meaningfully from externally measured outcomes in some contexts, not that self-reports are universally without value.
Other studies, like this one from Moritz Beller, suggest developers are at least directionally consistent in self-reports:
https://arxiv.org/pdf/2012.07428
And this excellent study from Andre Meyer highlights the research value of having developers self-report how they spend their time:
https://www.microsoft.com/en-us/research/publication/today-was-a-good-day-the-daily-life-of-software-developers/
So I don’t agree with the premise that developer perceptions are inherently unreliable or uninformative. I’m a strong believer in mixed-methods research, and I think there’s substantial value in triangulating across telemetry, experiments, qualitative insights, and self-reports.
On the bottleneck point: I largely agree with you. I don’t think coding was ever the only bottleneck, and AI may simply be shinning a spotlight on constraints that were always there.
My own research has found that reducing collaboration friction can have a larger impact than AI adoption itself, and in another study, “Lack of clear business objectives” was the #4 most frequently cited workplace challenge among developers.
I completely agree that the fundamentals of good teamwork, coordination, and decision-making have always mattered enormously, and may matter even more moving forward.