Google’s principles for measuring developer productivity
“All models are wrong, but some are useful.” How Google makes sure their models for measuring productivity are useful and not problematic.
Welcome to the latest issue of Engineering Enablement, a weekly newsletter sharing research and perspectives on developer productivity.
This week I read Measuring Productivity: All Models Are Wrong, But Some Are Useful by Google researchers Ciera Jaspan and Collin Green. This paper gives an inside look at how Google approaches developer productivity measurement in a way that's useful and not problematic. The lessons they share may serve as a guide for leaders to evaluate the metrics and frameworks their teams are using.
My summary of the paper
Models are used to explain, describe, or predict the world, and are ideally made as simple as possible. (Simpler models are easier to understand and explain.) However, this simplification comes at a cost: we have to decide what to include and what to leave out in any model, and bad models will omit important details that undermine their utility.
“When you construct a model you leave out all the details which you, with all the knowledge at your disposal, consider inessential… Models should not be true, but it is important that they are applicable.” — George Box, British statistician, 1976
Measuring engineering productivity is fundamentally an exercise in model building. It requires selecting, mapping, and validating relationships between inputs and outputs. It also requires other careful considerations in order for the model to be both useful and not problematic. Here, the authors share the principles they’ve developed over time that shape how they measure productivity today.
1. Avoid single-metric models. “We can’t rely on a model that misses a major aspect of developer productivity.” Single-metric models don’t capture the inherent tradeoffs underlying productivity. A good example of a single-metric model comes from when leaders ask, “What’s the best metric for developer productivity?”
2. Measure all outcomes you care about, and capture multiple metrics for each outcome. Avoid trying to select one metric for each outcome you care about. If you’re measuring speed, capture multiple metrics for speed. The same applies to measuring something more specific, such as builds, tests, or deploys. Capturing multiple metrics helps make sure we’re getting a fuller picture and catching any discrepancies.
“One should carefully pick multiple metrics and, ideally, utilize more comprehensive subjective metrics to avoid blind spots… Adding in a subjective and general measurement of engineering velocity allows one to get at the broader context of productivity (the context that might be missed by a narrower metric), and it also presents the opportunity for discrepancies to arise. If two measures of engineering speed move in opposite directions, that is interesting and worth further investigation.”
3. Be mindful of incentives created by measurement. “When a measure becomes a target, it ceases to be a good measure.” Developers are humans, and as such, they will have a strong incentive to hit any target one sets for productivity. Setting a goal around a single metric or even a single aspect of productivity (e.g., speed) can create incentives that lead to poor tradeoffs or even undesired behaviors. Models that explicitly seek to capture tradeoffs and avoid blind spots are less susceptible to creating unwanted incentives.
4. Measure different facets of productivity. This principle is similar but distinct from #1 and 2. Google’s team always makes sure they select a reliable set of productivity metrics with good coverage across speed, ease, and quality—the three dimensions of productivity—rather than focusing on just one dimension.
5. Use system-based and self-reported data together. Both methods for capturing data have their own strengths and should be used together.
The authors conclude with a few reminders:
Small teams (i.e., "if you can fit in a conference room") don't need to measure productivity since they can just get together and have a discussion.
Larger teams should start with surveys: "[surveys] get you surprisingly far... you can get a sense of the broader experience of developers at your company and prioritize improvements accordingly"
Always start by defining your reason for measuring.
Final thoughts
This paper provides a set of principles for measuring developer productivity in a way that’s useful and doesn’t cause unintended tradeoffs. Leaders can use the guidance in this paper to evaluate their current approach.
Who’s hiring right now
This week’s featured job openings. See more open roles here.
Adyen is hiring a Team Lead in Platform Engineering | Amsterdam, Netherlands
Snowflake is hiring a Director of Engineering - Test Framework | Bellevue and Menlo Park
Realtor is hiring a Head of Developer Productivity | Austin, TX
Lyft is hiring an Engineering Manager - DevEx | Toronto, Canada
Capital One is hiring multiple roles - DevEx | Multiple cities (US)
Amazon is hiring a Senior Programmer Writer - ASBX | Seattle and NYC
Thanks for reading. If you’re enjoying this newsletter, consider sharing it.
-Abi