Measuring PR throughput—perspectives from SPACE author Brian Houck

PR throughput is a controversial metric. Brian provides advice on where it’s useful and how to leverage it.

and

Dec 03, 2024

Welcome to the latest issue of Engineering Enablement, a weekly newsletter sharing research and perspectives on developer productivity. If you want insights like this delivered to your inbox, subscribe here:

Last week I published a podcast episode with Brian Houck, developer productivity researcher at Microsoft and co-author of the SPACE framework.

One of the topics I was excited to discuss was the use of PR throughput as a productivity metric. We included PRs per engineer in the DX Core 4 framework (announcing soon), but not without debate. There were differing views on the metric amongst the researchers involved.

Several years ago, I gave a talk at GitHub Universe where I criticized the use of this metric. But my opinion has since changed: I’ve seen the metric be useful when used properly as part of a basket of metrics, and only measured in aggregate (not by individual).

Brian is someone who I’ve known to be an advocate for using this metric, and he has many years of first-hand experience incorporating it in his work at Microsoft. Below is a transcript from our discussion—you can tune into the full podcast episode here.

A somewhat controversial part of the SPACE framework was that it included measuring developer activity. Since you were involved in adding activity as a dimension, what was the thought process at the time?

Brian: When we were creating the SPACE framework, there was a lot of debate about which dimensions to include. In the end, we didn’t add activity by accident—it was included for very good reasons. The key is understanding the context in which you use it.

Personally, and maybe somewhat surprisingly, I am a huge fan of PR throughput as a metric, though its value depends entirely on how it’s used. It’s important to know where it works well and where it doesn’t. For example, it is incredibly poor as a measure of individual productivity. If you’re using it to assess developers, I’d argue it’s an outright inappropriate metric.

“It is incredibly poor as a measure of individual productivity. If you’re using it to assess developers, I’d argue it’s an outright inappropriate metric.”

However, it’s incredibly powerful and useful for measuring the health of your system. It can reveal friction in the developer experience, particularly when it comes to code flowing through the system, and it helps identify where developers may be struggling to efficiently complete their work.

So understanding where and how you can use PR throughput—and equally important, where you shouldn’t—can make it a really powerful tool for uncovering insights in meaningful ways.

“It’s incredibly powerful and useful for measuring the health of your system. It can reveal friction in the developer experience, particularly when it comes to code flowing through the system, and it helps identify where developers may be struggling to efficiently complete their work.”

I’ve talked to many well-known organizations that use PRs per engineer as a central part of how they’re thinking about and measuring productivity. What’s your opinion on that practice?

Brian: I think PR throughput is useful as a top-level metric as long as it’s part of a broader basket of metrics, including things like developer satisfaction. I’m a proponent of it here at Microsoft because measuring across multiple dimensions creates a system of checks and balances.

At the end of the day, we see that increasing PR throughput is not only good for the business, it’s good for developers. Reducing friction in the PR process not only helps developers flow more code, but it also makes them happier. They feel more productive, and no one wants friction in their tools and processes.

PR throughput shines a light on a lot of different places of developer pain, whether that be slow CI builds and test passes, or friction in human processes like in the code review process. As you remove that friction, it makes the lives of developers better, and it also allows you to more efficiently move code through your system, which is good for the business.

It’s interesting, when Nicole [Forsgren] and I last talked about it, she similarly said, “It’s not perfect, but it’s pretty darn good.” Generally we see that as long as you avoid the don’ts, PRs per engineer is a useful input into understanding productivity.

Brian: I actually have a real-world example that illustrates this. In March of 2020, when Microsoft—like much of the world—moved to remote work, we weren’t a particularly remote-friendly company. Most of our developers weren’t used to working remotely, and we weren’t sure if our developer infrastructure could handle the sudden shift.

PR throughput became a key way to evaluate whether our systems could manage the load of all developers going remote at once. What we found was that PR throughput actually went up dramatically. There were some learning curves during the first few days, but after that, we saw it skyrocket.

I often use this example to point out that a rise in PR throughput isn’t always a good thing. In this case, it showed that while our systems were healthy, other factors were at play. We discovered that part of the increase was due to developers working longer hours—they were in the early days of the pandemic, quarantined, with nothing else to do. At the same time, 78% of our developers reported feeling burned out.

This was a reminder that it’s not just about asking, “Is the metric going up?” It’s about understanding why it’s changing. Without that dramatic shift in PR throughput, we might not have looked deeper into what was happening with our developers.

One of the challenges of PR throughput is that it gets associated unfairly with other metrics like counting lines of code. The comment is always, “This is actually just lines of code repackaged up into bigger units, right?” And they sound similar, but they’re really not the same thing. PRs are units of value and work that are not tied to the number of lines of code.

“One of the challenges of PR throughput is that it gets associated unfairly with other metrics like counting lines of code. The comment is always, “This is actually just lines of code repackaged up into bigger units, right?” And they sound similar, but they’re really not the same thing. PRs are units of value and work that are not tied to the number of lines of code.”

Have you seen any impact on PR throughput from the GenAI tools you’re experimenting with at Microsoft?

Brian: Based on both internal research at Microsoft and external research from other organizations, it’s clear that most engineering teams deploying AI tools are seeing an increase in PR throughput. These tools are helping developers complete tasks faster. However, as with most things, there’s nuance: AI doesn’t necessarily speed up all types of tasks.

It’s also important to remember that not all developer work involves creating PRs. A lot of developer work isn’t actively coding. In fact, studies show developers spend only 15–20% of their day coding, and tools like GitHub Copilot may not impact the remaining 80% of their work. As with applying the SPACE framework to any problem, the goal isn’t to rely on a single metric but rather to use a basket of metrics. While an increase in PR throughput is valuable and correlates with developer satisfaction and feelings of productivity, it needs to be balanced with other metrics. Are developers happy using AI? Do they feel it makes them more efficient? Is it improving their collaboration with peers in a healthy way?

PR throughput is a useful metric for evaluating the impact of AI, but it shouldn’t be used in isolation. That’s true for this and many other metrics. Context and balance are key.

What’s your advice on how to present and communicate this metric in a way that avoids backlash, especially from developers?

Brian: When I talk with developers at Microsoft, I try to be unwavering in how I present metrics, always framing them through the lens of, "This is how improving this metric will make your life better." I don’t advocate for metrics unless I can back them with research showing that improving them will genuinely benefit developers.

I also try to anticipate and address potential criticisms upfront. I always emphasize the importance of using a basket of metrics as a system of checks and balances. But ultimately, I rely on rigorous research to show developers, "Improving this metric will make your life better, and here’s how.”

Have you considered any variations of this metric? Organizations often point out that "not all PRs are created equal," which adds some fuzziness to the number. Have you experimented with things like weighted PR throughput or classifying different types of PRs and measuring them differently?

Brian: We’ve definitely thought about it, and we’re always trying to find ways to classify PRs that make this metric more actionable. The honest truth is that we haven’t found anything yet that’s any more actionable than simply looking at it in its sum total.

Interestingly, one of the advantages of this metric is that while it might seem susceptible to gaming—like breaking up work into smaller PRs—that actually turns out to be a good thing. Smaller, more frequent PRs result in more code flowing through the system with less friction. So, the metric is quite resilient to gaming.

Who’s hiring right now

This week’s featured Developer Productivity job openings. Find more open roles here.

Pinterest is hiring a Senior PM - Infrastructure | San Francisco, CA
Snowflake is hiring an Engineering Manager - DevEx | San Mateo, CA
Lely is hiring a Senior SRE - DevEx | Netherlands
Capital One is hiring multiple DevEx roles | US

That’s it for this week. Thanks for subscribing to Engineering Enablement. This newsletter is free so feel free to share it.

-Abi

A guest post by

Brian Houck

I am an Applied Scientist, studying developer productivity and wellbeing at Microsoft.

Engineering Enablement