No Single Metric Captures Productivity
Tracking individual developer activity may not be necessary for measuring productivity.
This is the latest issue of my newsletter. Each week I cover the latest research and perspectives on developer productivity.
This week I read No Single Metric Captures Productivity, an article by Google researchers Ciera Jaspan and Caitlin Sadowski that is included in the book “Rethinking Productivity in Software Engineering”. For teams considering whether or not to track developer activity metrics, this paper provides a helpful analysis: it explores the motivations for using such metrics, distills what research says about the value they provide within the context of understanding productivity, and suggests alternative approaches.
My summary of the paper
Organizations have attempted to measure developer productivity for decades. Early attempts included tracking lines of code, yet similar approaches exist today: for example, tracking developers’ commit activity and pull requests. The authors here suggest that attempts to use such metrics to measure developer productivity will not produce the desired results.
Measuring individual developer activity to measure productivity may not be necessary
There are several possible motivations for measuring individual developer activity, one of which is to identify high and low-performing individuals and teams. However, the authors argue that using metrics to identify low performers may not be necessary: “It is our experience that managers (and peers) frequently already know who the low performers are. In that case, metrics serve only to validate a preexisting conception for why an individual is a low performer, and so using them to identify people in the first place is not necessary.”
Further, developers engage in a variety of other development tasks beyond just writing code. The authors say, “When we create a metric, we are examining a thin slice of a developer’s overall time and output,” which makes using activity metrics problematic for assessing productivity. They continue: “Even for the narrow case of measuring productivity of developers in terms of code contributions, quantifying the size of such contributions misses critical aspects of code such as quality, or maintainability. These aspects are not easy to measure; measuring code readability, quality, understandability, complexity, or maintainability remain open research problems.”
Finally, tracking these types of metrics can create a morale issue, and morale issues can negatively impact overall productivity. This is primarily a result of developers’ attitudes towards such metrics and their concern about how measurements could be misinterpreted. Additionally, if these metrics impact developers’ performance grading, “These high stakes further incentivize gaming the metrics.”
Attempts to create a single metric that adequately captures productivity are counterproductive
Some organizations attempt to create a single metric to capture productivity, which is often a calculation that includes various factors within it. The authors say this is problematic for two reasons:
Flattening or combining aspects into a single measure makes the measure harder to understand and less actionable. “If a variety of factors (e.g., cyclomatic complexity, time to complete, test coverage, size) are compressed into one number representing the productivity impact of a patch, it will not be immediately clear why one patch scores 24 and another one scores 37.” Further, these scores may be misleading: a higher number may not necessarily be better.
Confounding factors can make a metric meaningless. The types of projects a team works on, the processes they use, their languages, tools, culture — all of these are factors that can affect how productive one team appears to be according to activity metrics when compared to the next.
An alternative solution: learn from how Google measures productivity
As a suggested path forward, the authors describe how Google approaches productivity improvements: teams at Google focus on making data-driven improvements to specific workflows.
They start with a concrete research statement, and then seek metrics to address the question at hand. They call this the Goals, Signals, Metrics framework, which is further detailed in Jaspan’s paper, Measuring Engineering Productivity. Here’s a quick overview of this approach:
Goals: Start by defining the goal you’re trying to accomplish. For example, “Engineers write higher-quality code as a result of the readability process.”
Signals: Then, define signals. A signal is how we know we’ve accomplished the goal. For example, “Engineers who have been granted readability judge their code to be higher quality than those who have not been granted readability,” and “The readability process has a positive impact on code quality.”
Metrics: Then, select metrics. These are proxies for the signals. For example, a quarterly survey that measures the proportion of engineers who report being satisfied with the quality of their own code. Or, a readability survey that measures the proportion of engineers reporting that readability reviews have no impact or negative impact on code quality.
After selecting metrics, teams at Google go through an exercise of validating metrics. That includes questions such as “what action will be taken if we get a positive or negative result?”. They also validate objective metrics using qualitative research to ensure the metrics are measuring the original goal. To illustrate, the authors recall an example where they examined distributions of log events for developers, and discovered developers that were making an action on a web page tens of thousands of times. This would have been an anti-pattern, however by asking developers, they learned that this was happening as a result of a chrome extension.
This approach allows teams at Google to make data-driven improvements by starting with goals first, instead of starting with the vague concept of productivity.
Final thoughts
The decision of whether or not to track developer activity metrics as a way to understand developer productivity is an often debated topic. The points made in this paper can help leaders better navigate the potential pitfalls of using these types of metrics, while also considering an alternative approach.
That’s it for this week! Share your thoughts and feedback anytime by replying to this email.
If this email was forwarded to you, subscribe here so you don’t miss any future posts:
-Abi
As ever, excellently observed and a neat précis