Discover more from Engineering Enablement
How Google Measures Developer Productivity
An interview with Engineering Productivity Researchers at Google on how they use a mixed-methods approach to measuring productivity.
This is the latest issue of my newsletter. Each week I cover the latest research and perspectives on developer productivity.
Google is consistently referred to as an industry leader when it comes to measuring developer productivity. The company’s approach to understanding productivity is impressive not just because of Google’s significant investment in this area, but also because of how thoughtful and systematic the approach is.
When Google Engineering Productivity Researchers Ciera Jaspan and Collin Green started writing their Developer Productivity for Humans series, I was eager to speak with them to learn more.
In this interview, Ciera and Collin take us under the hood to reveal more about their team’s work and how they measure developer productivity. I’ve summarized the key parts from the discussion below. You can also listen to the full interview here.
How the Engineering Productivity Research team works
What does your team focus on, and what backgrounds are on the team?
Ciera Jaspan: Our team was started as a way to try and understand what we can do to improve developer productivity. Previous to our team's creation, many of the decisions about what types of tools we needed at Google were made by an engineer taking a good guess. But that approach can only take you so far.
We wanted to create a team that would better understand the on-the-ground developer experience so that we could figure out how to improve tooling, processes, and everything that developers do.
We created the team with the idea of it being a mixed-methods research team. From the start, it was not just a team of software engineers. We also had UX researchers involved right away. We try to bring people from a variety of fields.
Collin Green: People are sometimes surprised to find out that we have eight software engineers as well as eight or nine UX researchers. As Ciera said, our backgrounds are quite diverse, especially among the researchers. For example, we've had behavioral economists, social psychologists, industrial organizational psychologists, and we have somebody from public health.
This strengthens what Ciera said about taking a mixed-methods approach. We really lean into using a wide variety of methods—used together in concert, not in isolation—so that we get a complete picture of what the developer experience is like. We do diary studies, survey research, interviews, qualitative analysis, and logs analysis. We use a range of methods to understand exactly what's happening as best and as holistically as we can.
Ciera Jaspan: The goal is always to triangulate on developer productivity. We use all these different research methods: are they all aligned? Are they all telling us the same underlying story of what's happening? Or are they misaligned, in which case we need to dig deeper to figure out what's going on?
What is an example of a team you partner with?
Collin Green: One of our big customers is the team that builds all of our internal and homegrown tools. For example, when they want to understand what makes developers productive and what could make them more productive, our research is one of the places they go to. Apart from helping these teams improve tools, infrastructure, and processes, we also help them understand the impact their work is having.
Measuring developer productivity
You mentioned taking a mixed-methods approach to developer productivity. What is that, and how does it fit into how Google measures developer productivity?
Ciera Jaspan: When we're measuring developer productivity, we start with the general philosophy that there is no single metric that captures developer productivity. You have to triangulate on this. We do that through multiple axes: the first one is in how we focus on capturing three aspects of productivity any time we’re measuring. We capture speed, ease, and quality. These three aspects are in tension with each other, which is what we want. For example, you could increase speed by removing code review altogether, but that would be removing a basic quality check.
Within those three aspects of productivity, we capture different types of metrics. To measure speed, we‘ll capture logs data and we’ll measure people's beliefs of how fast they're going.
The point is, there’s not one metric for speed—we have multiple metrics to understand speed, which are captured using different types of metrics. This exemplifies what it means to take mixed-methods approach.
As another example, we look at active coding time at Google. To measure that concept, we have created a metric in our logs, and we also use diary studies that ask engineers to write down every few minutes during the day what they are doing, which we use to make sure that what they said they do is matching up with data from our logs. That gives us some confidence that our logs data is actually accurate.
Have you run into discrepancies between the survey metrics and logs-based metrics? What have you learned from instances where the data doesn’t line up?
Ciera Jaspan: Yes, we've had some cases where there's been discrepancies. They tend to fall into two categories. One is that the logs data was wrong. This happens rather regularly when there's a discrepancy. We actually had a case recently where we were asking our engineers to take a survey after every build, in order to correlate build speeds with satisfaction and velocity. But we got some weird survey responses, where engineers said, "What are you talking about? I didn't do a build." We thought, "Well, that's weird, because the logs data says you did a build.”
It turned out that the logs data was not just including builds that engineers kicked off themselves, it was also including robotic builds that were happening in the background that the developer wasn't even paying attention to. Those were useful for other purposes for the developer tools, but they didn't factor into the developers’ experience. When we removed those builds from the data, it gave us a very different picture about the build latency that developers were experiencing at Google.
Another cause of discrepancies is when there's an underlying facet that we weren't measuring yet with one of the methods. An example of this would be if the survey data was representing a larger concept than the logs data was. Maybe you’re asking engineers, “how do you feel about your developer velocity?” There are a lot of things that go into a developer's velocity, whereas maybe you're only measuring one small part of velocity with logs data. You might see those two metrics diverging because one of them is measuring a bigger concept.
There’s all this data you’re collecting. For the teams and leaders you work with across the organization, how do you figure out which numbers actually matter?
Ciera Jaspan: We always encourage people to follow the Goals, Signals, Metrics approach for deciding what to measure. First, write down your goals. What is your goal for speed, your goal for ease, and your goal for quality? Then ask yourself, what are some signals that would tell you that you've achieved your goal? What would be true of the world if you've achieved your goal?
Then, select metrics. That’s when we can start talking about what the right types of metrics are. Do we use surveys for this, or do we use logs?
Collin Green: We do have an assortment of metrics that we hold up as a good starting point that map to speed, ease, and quality. Our philosophy has been to lean into the metrics that we are confident in, because we've already done the work to validate that the metrics reflect what developers are experiencing.
We really lean into the notion that you need a variety of metrics across each aspect of productivity as well as across objective and subjective to understand what's happening.
The quarterly developer survey
Tell us more about the quarterly survey at Google. Who runs it, and how does it work?
Collin Green: We've been running the engineering satisfaction survey for over five years now. We use a sampling method to survey developers so the survey hits a representative population of about one third of Google engineers every quarter.
There are many structured questions in the survey, including likert scale questions as well as open-ended questions. We'll ask, "how satisfied are you with the following engineering tools?” If a developer says they’re less than satisfied, there's a follow-up question: “You said that you didn't like some tools. Tell us more." Our engineers will write long, structured paragraphs. That's so useful, because it gives the tooling teams a direct source of qualitative information about what’s causing problems. The open-ended text questions have been a real goldmine for some of the product teams that either don't want to, or can't, run their own user research. We provide fairly organized and sanitized open texts for them to mine so they can orient their roadmaps around the feedback.
As far as who designs it, it's an evolving product. Initially, some UX researchers, along with a lot of collaboration from our engineers, decided on what topics to address. They crafted the survey and piloted it with engineers, then launched and iterate on it.
Every quarter there's a UX researcher and an engineer dedicated to executing the survey. It’s a team effort to work on refinements and triage requests for changes to the survey, and everybody is part of the analysis because there's so much data that comes out of it.
We've also built a lot of infrastructure around it. Executing a survey consistently and longitudinally is a challenging thing in of itself, so having specific people assigned to that task is important. The engineering support for building infrastructure to automate key steps and to manage the data has also been huge.
It’s also worth mentioning that we focus on being transparent with the data. No data ever leaves our team un-aggregated—the responses are all private—however once we aggregate the data, it goes out onto dashboards that are available to anyone.
Ciera Jaspan: It's been interesting to see how people have really gotten into this. I'm always surprised to find new slide decks or different places in the company where people cite our survey data, when we haven’t talk with them. Because the survey is so widely used, people go back to the survey to either understand their team or the product they're building for other developers. They’re able to say, "We need to invest more here,” or “we need to shift our focus on this particular area," and point to the survey data.
For others who may be trying to get a developer survey off the ground, how do you get buy-in for a survey program?
Ciera Jaspan: This makes me think about some VPs who were previously not sure about survey data and then became some of our biggest fans. We worked to help them understand how to utilize the survey data. Specifically, we encouraged these leaders to go to the survey data first, because if you just look at logs data, logs data doesn't tell you whether something is good or bad.
For example, we mentioned before that we have this active coding time metric. We know the active coding time it takes to make a change, but that number is useless by itself. Is this a good thing? Is it a bad thing? Do we have a problem? Who knows.
So we encourage executives to go to the survey first to see where their top problems are. We highlight to them, "It looks like the top five problems are the following things. Maybe one of those is something you can make an impact on.” And the one they can impact might not be the top problem. The top problem might be something that they cannot independently change. But maybe problem number 2, 3, or 4 is in their control. Now that we know what to focus on, now we can go look at the logs data and try to see, "How big of an issue is this at scale? What's the number we want to set our goal to?"
We started convincing a few VPs to take this approach and they really liked it. They've now gone through this a few times saying, "Yeah, I just find my next big problem, focus on that, see it improve in the logs, and then later in the survey as it starts to fix things for the developer. Now let's go take the next big problem.”
Is there one thing you feel is fundamentally misunderstood about surveys as a measurement instrument? If so, what would that thing be?
Collin Green: My glib answer is that people are under the misimpression that it's very easy to run a good survey, when in fact the easiest thing you can do is run a terrible survey. There are people who've dedicated their careers to survey construction. I'm not even one of them, I just have some training in this area. But I think people misunderstand how difficult it is to construct, execute, and analyze a survey effectively.
The other one relates to having to convince engineering leaders about the validity of surveys or any qualitative research. We point out to leaders that they've hired excellent engineers, excellent experts, and that asking them for their expert opinion is very valuable. When presented in that way, this can be persuasive.
I remember having a conversation with Ciera about why it's useful to ask engineers about these things. We go out as a company and we try to hire the best engineers out there, the smartest people we can find, and then we're like, "Ah, don't ask them any questions." That's really silly. Our engineers are very good integrators of information. They're observing all of these variables that impact their productivity. When you ask them, how's your productivity? They’re considering a lot of factors simultaneously in a way that is tailored to their own productivity.
While the natural instinct might be to write off what Google is doing as being too advanced for others to borrow from, the core concepts Ciera and Collin laid out here are applicable to organizations of any size. I’m especially inspired by their approach to selecting metrics as well their emphasis on using a mixed-methods approach to measurement.
That’s it for this week! If you’re finding this newsletter valuable, share it with a friend, and consider subscribing if you haven’t already: