Microsoft's New Future of Work Report
The most important ideas and emerging themes related to AI in the workplace.
This is the latest issue of my newsletter. Each week I cover the latest research and perspectives on developer productivity.
Quick note: On Feb 13th, Laura Tacho and John Cutler will be hosting a conversation about the types of waste and friction in software development. If you’re interested in joining that discussion, you can register for it here.
This week I read Microsoft’s latest New Future of Work report. Microsoft started this series three years ago with the goal of covering major shifts in how work gets done. Their first reports covered the shift to remote and hybrid work. Their latest report focuses on AI.
There have been many excellent papers published about the ways in which work may change as LLMs, such as GitHub’s Copilot, are integrated. Microsoft’s report synthesizes some of the most important or emerging themes from this research.
In the future, I’ll cover papers on AI and software development in more depth. This issue gives an overview of the most important ideas and emerging research themes related to AI in the workplace.
Note: I recently had Eirini Kalliamvakou, a staff researcher at GitHub, on my podcast to talk about their research findings about the impact of AI on developer productivity. I’ve included a few quotes from that conversation in this summary where they fit.
Here were my takeaways from the report:
1. AI holds promise to increase developer productivity. Currently, the benefits of LLMs in software engineering depend on the task.
Tools like GitHub’s Copilot can generate code from natural language prompts and code snippets. One study from GitHub and Microsoft found that developers using Copilot completed a specified task 56% faster than those not using Copilot to complete the task. (Researchers asked developers to implement an HTTP server in JavaScript. They decided to use this for the experiment because it’s a repetitive task.) In the same experiment, developers noticed but underestimated the increase: they estimated a 35% increase in productivity.
A separate study, which looked at a different set of tasks, did not find that Copilot improve task completion time or success rate. However, developers preferred using Copilot and found it was helpful as it provided a starting point for their task, rather than starting from a blank canvas. Developers also surfaced concerns: some found it difficult to understand and change the code generated by Copilot.
2. Issues arise with writing prompts and overreliance
Writing effective prompts is difficult. It requires significant effort, including multiple iterations of modification and testing. Additionally, developers find it burdensome to inspect and check that the generated code is correct. There is research being devoted to making this easier, however tools and training can also significantly help people get better at communicating with LLMs.
Another concern is overrelying on AI and accepting incorrect AI outputs. In one study, a group of developers accepted the generated code without validating that it was correct. This can cause more work later on, when they have to go back and debug the previous code. Overreliance can be caused by familiarity with a task or confirmation bias, for example.
Another notable challenge is that LLMs still perform best in English, with performance dropping when used in other languages.
3. LLMs help the least experienced the most
Most early studies have found that new or low-skilled workers benefit the most from LLMs. For example, GitHub’s study found that developers with less programming experience, older programmers, and those who program more hours per day benefited the most from using Copilot.
Other studies with information workers, not just developers, found similar results. For example, one study with customer support agents found that a generative AI-based conversational assistant increased the number of issues resolved per hour by 14% on average, and a 34% improvement for newer or low-skilled workers. The tool also helped disseminate knowledge from more experienced workers to newer employees.
4. Adoption is influenced by how well AI tools fit within workflows
AI tool adoption can be influenced by social interactions—specifically, how someone’s coworkers use and talk about technology. It is also influenced by how well the tools fit within developers’ workflows.
Note: In our conversation, Eirini said that adoption for a tool like Copilot often happens organically. Typically developers at the organization start using it, then those individuals will advocate for it and train others. This also reminds me of a paper I’ve summarized which found that adoption for internal tools was driven by whether the tool is compatible with the way developers work, and how others are using the tool within an organization.
5. In the future, analyzing and integrating information become more important than being able to generate content
The use of AI represents a shift in which skills are most important. For example, because AI generates content, being able to analyze and integrate the information generated from AI-based tools may become more important than being able to create content. Additionally, skills such as researching, conceptualization, planning, and prompting become more important.
Note: To paraphrase something Eirini said: today, AI is acting like a “second pair of hands” for developers. “If developers have undesirable, more drudge work tasks, AI helps them get through these tasks faster… What’s coming next is, instead of being like a second pair of hands, AI tools will be more of a second brain. They’ll help with more complex tasks, helping developers tackle more complexity and saving them more mental capacity.”
A major area where organizations may benefit from the use of AI in the future is with knowledge fragmentation. In most organizations, knowledge is scattered across documents, conversations, apps, and devices, which makes it harder for people to find the information they need. LLMs have the potential to draw on knowledge generated through, and stored within, different tools and formats, and then surface it to users when they need it.
One known challenge with LLM systems is the time between a user issuing a prompt and receiving a response. There’s a great deal of research going into shortening this latency. This is an important problem to solve: even small delays can have a substantial negative effect on the user experience.
Final thoughts
There’s a lot of hype around AI, so I appreciate that Microsoft’s report shows how much research is going into understanding the impact AI is actually having. The research suggests that AI does have the potential to improve productivity and reduce cognitive load, yet its benefits are distributed unevenly across users, and it introduces new challenges.
In our conversation, Eirini said they’ve been hearing about leaders rolling out these tools with unrealistic expectations. An example of this is the belief that an organization can “try out” a AI-based tool for a month or two and “see how it goes.” Eirini says that’s not enough time: “It takes time and training… When you start rolling out a tool like this, you have to prepare the ground in terms of educating and training users because it takes effort to get everyone to use a tool like Copilot to its full potential.” Reports like this one from Microsoft may help leaders get a more realistic idea of what they should expect from AI.
That’s it for this week. If you know someone who might like this issue, consider sharing it with them:
-Abi
Interesting point about the LLMs helping the least experienced the most, I think it's a short term view. What I see around me, is juniors depending more and more on the LLMs, without a full understanding of the output. When it's time to debug, or explain the code, it's much harder for them.
In my opinion, it's also less 'practice' for the brain. When you don't solve problems yourself, and depend too much on the LLM, with time you'll be completely dependent on it.
This may be ok, as LLMs won't dissappear, the question is what do you do with the extra time, and what skills do you bring to the table. If the only thing you can do is copy paste from the LLM to visual studio code, you are easily replaceable.
Quality is the major victim of this gold rush.
GitHub Copilot - 55% “faster coding,” and 46% more “code written.”
Yay! 🎉 And how about the quality of all this code being generated?
GitClear (https://www.gitclear.com/) study aimed to measure the implications of this phenomenon.
-additions = usually correlate with the creation of new features
-moves = usually correlate with code refactoring
-deletions = tend to coincide with cleanup and increased codebase health
-duplicates = typically achieve the opposite
-churn = changes that were either incomplete or erroneous when they were authored
From their report:
"Looking at the variation of operation frequency and churn between 2020 and 2023, we find three red flags for code quality"
The most significant changes correlated with GitHub Copilot’s rise are
increase in churn and duplicates, and decrease in moves.
https://arc.dev/developer-blog/impact-of-ai-on-code/