Findings from Microsoft’s 3-week study on Copilot use
After three weeks of using Copilot, developers felt more positive about the tool and AI overall, but they also emphasized the ongoing need to carefully validate AI-generated code.
Welcome to the latest issue of Engineering Enablement, a weekly newsletter sharing research and perspectives on developer productivity.
This week I read Dear Diary: A randomized controlled trial of Generative AI coding tools in the workplace, a study from Microsoft that takes a closer look at the impact of GitHub Copilot after a period of regular use. The researchers examined both the measurable effects on productivity and how developers’ perceptions of AI tools changed.
My summary of the paper
The impact of AI goes beyond assisting with coding—it can also influence how developers see themselves and their role at work. This study set out to understand that deeper impact.
To explore how developers’ beliefs about AI tools change with regular use, the researchers conducted a randomized controlled trial. Over 200 engineers were enrolled and surveyed about their pre-existing views on GenAI tools, as well as their beliefs about their roles in their workplace. Participants were then split into a control group and a treatment group. The treatment group was given access to GitHub Copilot for three weeks.
During the three-week study, developers completed daily surveys about whether they used Copilot, how they used it, and their productivity. At the end of the study, participants were re-surveyed to understand whether Copilot usage shaped their perceptions of AI tools and their roles. The researchers also collected telemetry data, such as lines of code written, time spent coding, and PR activity to track any changes in output.
Here’s what they found:
Developers’ preconceived notions about AI tools
Coming into the study, most developers hadn’t used Copilot yet. (16% were regular users, and the rest had only used Copilot a few times.) The top reasons developers hadn’t tried it yet were because 1) they were too busy/didn’t have enough time, 2) didn’t think it would work well, 3) had a technical blocker, or 4) didn’t feel the need to use it.
The pre-study survey surfaced several other insights about developers’ preconceived notions about AI tools:
Developers with prior experience were significantly more likely to view Copilot as useful and enjoyable. Those without experience were far more skeptical.
Regardless of prior experience with AI, most developers were cautious about the code generated by AI. Only about 20% said they trusted the code, and just 30% believed the tools were reliable.
65% of developers said they believe they couldn’t be replaced by AI. The rest were mostly neutral, with 10% believing they could be replaced by AI.
57% of the participants felt positive about their colleagues using AI tools, believing AI improved productivity, was useful, or were simply impressed by developers who had already been using AI.
The impact of AI use on developers’ beliefs and productivity
After the study, the researchers analyzed data from telemetry and from surveys. They found that regular use of Copilot led to meaningful shifts in how developers perceive AI tools. Developers reported significantly higher enjoyment and perceived usefulness after using Copilot. Additionally, more developers began to view GenAI as inevitable and representative of “the future.”
When it comes to productivity, the findings were nuanced. Telemetry metrics like time spent coding and PR activity showed no statistically significant changes before and after using Copilot. However, the researchers suggest this may be due to the short duration of the study. As they note, “The study [may have been] too short for learning and getting proficient with a new tool.” More recent research points to a longer ramp-up period (around 11 weeks of daily use) before developers begin to see measurable gains.
Despite the lack of impact seen in the telemetry data, developers did report productivity improvements. In post-study surveys, many said Copilot saved them time, particularly by reducing the time required to write boilerplate or repetitive code. Others noted that it helped them stay focused by keeping them in the IDE, made it easier to learn new languages, and even increased their likelihood of writing unit tests and documentation.
Also interesting: 66% of developers reported a change in how they feel about their work after using AI tools. For some, it sparked a sense of excitement and optimism. For others, it created a sense of urgency to upskill and not be left behind.
How developers use Copilot and the challenges they experience
Each day during the study, developers were asked whether they used Copilot and how they used it. Copilot showed up in 56% of daily surveys. That might seem low, but developers do a lot more than just write code, so it shouldn’t be surprising that coding assistants weren’t used every day. (Prior research has found that on a “good day,” about 20% of a developer’s time is spent writing code.)
When Copilot was used, the top use cases were writing boilerplate code, repetitive code, docs, and comments. Another theme was using AI to replace web search: developers turned to it for help understanding an application, exploring new APIs, or finding answers to niche questions.
Developers were also asked about the challenges they faced each day with Copilot. The most common issues included:
Generating code that looked correct, but wasn’t. This raises concerns about subtle bugs slipping through.
The extra time required to validate AI-generated code, which sometimes canceled out any productivity gains.
Copilot not working as well for less common languages and file types.
A common theme that emerged from the challenges developers faced was the need for more validation. Code can now be generated more quickly than ever before, but it is “faster with more mistakes” and therefore requires a greater degree of critical analysis.
Final thoughts
This study offers a different lens for understanding the adoption challenges and early impact of GenAI tools. Developers saw Copilot as being more useful and enjoyable after regularly using it for just three weeks. They also reported that it helped save time on repetitive tasks. At the same time, the study reinforces the concern that AI-generated code can sometimes look right, but isn’t, and requires a greater focus on validation.
This research is also a helpful reminder to set realistic expectations. Gains from GenAI tools aren’t immediate, especially in the early phases when developers are still building confidence and figuring out how to integrate these tools into their work. For DevProd teams, the study sheds light on what slows adoption: why developers hadn’t tried Copilot, what challenges they faced once they did, and what concerns still remain.
Who’s hiring right now
This week’s featured DevProd & Platform job openings. See more open roles here.
ScalePad is hiring a Head of AI Engineering & Enablement | Canada (Remote or in-office)
Capital One is hiring a Manager, Product Management - Platform | Plano, TX and McLean, VA
Preply is hiring a Senior DevEx Engineer | Barcelona
Snowflake is hiring a Director of Engineering - Test Framework | Bellevue and Menlo Park
That’s it for this week. Thanks for reading.
-Abi