Measuring developer productivity: A clear-eyed view
Kent Beck grilled me on Goodhart’s Law, benchmarks, and the purpose of measurement.
Welcome to the latest issue of Engineering Enablement, a weekly newsletter sharing research and perspectives on developer productivity.
It was on an otherwise typical Tuesday morning that someone from my marketing team told me they’d lined me up to be interviewed by Kent Beck.
My first reaction was “what!?”. Kent and I have corresponded for years about developer productivity metrics. And both publicly and privately, Kent has been a staunch and vocal critic of metrics.
When Kent and I jumped on our Zoom call to record our interview, I joked that “I guess I’m about to get grilled.” Kent laughingly said: “Yep, and the irony of this is that you guys paid me to grill you.”
If I’m depicting Kent as the evil villain of DX, it’s quite the opposite. Kent’s self-described personal mission is to “help geeks feel safe in the world”. At DX, not only are we pioneering research to elevate developer experience, but our metrics give developers a platform to elevate the issues that make their lives worse.
So I’ve always viewed Kent and I as being on the same team. Like Kent, I too have seen repeated failures by leaders trying to roll out metrics, resulting in disaster. Like Kent, I too see “developer experience” as conveying the wrong message to the business.
So I was really excited (though also nervous) for a candid, on-the-record conversation about these topics and more. Kent challenged me on Goodhart’s Law, using benchmarks, and even DX’s business model.
Before the interview, Kent warned me that it would be a “very challenging interview”. To our mutual pleasant surprise, we found more common ground on the topic than either of us expected.
Here’s the interview, reposted from Kent’s blog.
Goodhart’s Law
Kent: I would like to adopt a new resolution that any conversation about metrics has to start with Goodhart's Law. And to your credit, in the podcast I listened to, there was a mention of Goodhart's. But oh my goodness, it's so important. When you pitch metrics, it's in your best interest that people adopt them. How do you navigate both communicating the upside and the potential downside?
I once lived next door to a guy who sold get-rich-quick options trading advice. The SEC stepped in and said you have to make these disclaimers to continue in business. He said if he made those disclaimers, people would stop buying - it was the sleaziest thing you could imagine. He made the disclaimers and went bankrupt. Your advice isn't bad, but it has significant downsides. How do you navigate that?
Abi: I want to say one thing that I think is often lost in this conversation. There's really two fundamental core use cases for these types of metrics. One is for engineering leadership and managers to use this as a way to gauge performance and try to improve that. I think that's the one that really gets talked about - whenever metrics comes up, it's all about that.
But in the world we live in, there's a second and really almost primary use case: measuring the success of engineering system investments and platform work. At Gusto, the CTO looks at developer experience metrics, but who we actually work with is the head of developer experience who's trying to fix CI tooling. For him, these metrics are a communication tool to report up to the business. Developers aren't even thinking about those metrics - it's an internal KPI for the engineering systems team.
One thing that we give as guidance to companies - and I stole this from Google, they've said this best - is that Goodhart's Law is 100% true. As soon as these measures become something folks are incentivized to game and manipulate, the very signal you worked so hard and paid for becomes useless or inaccurate.
Kent: It's worse than inaccurate, because people degrade the system to produce the number.
Abi: Yeah, it's worse than just the numbers being bad - it's actually harming how software is produced. One of the keys is how these metrics are communicated. The way Google does it, and the way we advise our customers, is that the goal of measurement is to be an ally to developers. Ultimately, the goal here is to remove what you called “undifferentiated friction” from doing work. When that's the message - and all these Core Four metrics are about making it easier to get more done, not measuring output - that's the way to talk about it with developers and teams.
Kent: You have to genuinely be an ally. This isn't about communication strategy. There's a kind of sin - I was born and raised here in Silicon Valley watching my dad be an engineer, I've seen this play out so many times. It's like, “Well, how can I make it sound like I'm on the programmer's side? Because if they're convinced I'm on their side, then I'm gonna make a bunch of money, even though I'm screwing them over.” It's just a way to get more out of people without actually giving them anything back. If that's your actual goal, you can paint over it with active listening and pretend empathy, but eventually it'll show through and be negative for the world.
Abi: That cynical point of view is valid, though I don't generally agree based on what we see. I think a lot of folks really are trying to be allies to developers, especially when you think of engineering systems teams where developers are their customers.
I think the three key things are:
1. Communicating and following through on being an ally to developers
2. Never measuring metrics like diffs per engineer at the individual level - that's something we stipulate in our framework and build into our product
3. Using it within this basket of metrics - developer experience is as important as speed, and quality balances both
People can still punch through that and just focus on speed, but by making those four pillars [ed: Speed, Effectiveness, Quality, & Impact] counterbalance each other, it helps steer everyone away from just thinking about how many widgets are being cranked out.
Developer experience
Kent: Extreme Programming is deliberately divided into values, principles and practices. That was a choice I made early on because I had seen focus on practice be twisted. People would say “Oh, I'm doing this because I'm supposed to do TDD.” The values level is too vague - we say we value innovation but look around and it's all about executing according to plan. I wanted both of those in place, with principles in between, because every situation is different. You have to take context into account to derive the practices you want to use right now, which may be different next month.
I see those four pillars at the level of practices, but there's a set of values around that you're communicating effectively. I kind of flinch at “developer experience” because it isn't just about the developer. I've been to meetings where it's like “Oh, the developers aren't happy, we want to make developers happy.” Don't make me happy! I remember one of my kids' sports teams, the coach said, “We're here to have fun. You know what's fun? Winning. If that means we have to practice extra hard or do more drills so you're more capable of winning, and you don't like doing more drills - well, we're aimed at a higher level of joy than whether you have happy-clappy fun today.”
Abi: Well, you and I had that email exchange about developer experience, and it really resonated. You're not alone at all in that reaction. That's why we call it “effectiveness” in our framework. A lot of people ask why developer experience index is under effectiveness - even folks like Peggy Ann Story were like, “No that should be called developer experience.” GitHub says it should be called developer happiness. But it needs to be called effectiveness because that's what we're actually trying to measure - not developer happiness but effectiveness.
These questions are about workflow friction and process effectiveness, not, “Do I have enough free beer?” or “Am I happy?” We measure on a frequency scale from never to always, trying to assess friction in workflows. Our intention with developer experience is very much not about comfort and happiness - it's measuring the process through the lens of the developer.
Kent: If you're happy every day and your product ends up tanking and you get fired, that's an even worse experience than being unhappy every day and having your product fail. Because then your stock options are worthless and you felt good the whole time before this horrible reversal.
I wanted to come back to something you said about eliminating friction or waste. There's a message that comes through to programmers, whether intended or not, which is, “We need you to hurry up” - and that trick never works. Yet somewhere in the communication between what the business wants and what programmers hear, that message comes through. I'm not interested in whose fault it is, I'm just interested in how we're going to say something different. Some of the questions you showed me weren't about what would make you go faster, but what's making you go slower.
Abi: John Cutler wrote an article called “Anti-productivity” - he said you can't measure productivity but you can measure anti-productivity. That's the spirit of what we're trying to do with the developer experience index. We're trying to measure the waste, because if we get rid of that, we can go faster - versus telling people to go faster.
The way we're validating this survey is by correlating it with developers self-reporting what percentage of their time is lost to inefficiencies. Based on our latest analysis, each one point gain in the DX score corresponds to 13 minutes per week per developer of time saved, based on what developers themselves are saying.
Kent: [Looking skeptical] How many developers have actually experienced a really effective development environment? Do they know what really good looks like?
I've recently begun talking about the Forest and the Desert [ed: here]. The Desert is typical development where there's just not enough time to go around - not time to learn, not time to help others. God bless them, people actually produce value in the Desert and it's sustainable and profitable. Kids go to college, houses get paid for - fantastic. But that's not the only possibility. There is this Forest. People come up to me saying in 2005-2007 they were on an Extreme Programming team and it was the best three years of their career. I just want to cry - here's somebody who knows the difference and can't create it. So I'm skeptical about this question because I'm not sure the majority of developers have ever seen what efficient, in this positive sense of lack of friction, actually feels like.
Abi: That's fair. But directionally, I think it yields insight - people know what they're frustrated about. They know when they can't get their diffs approved or tests take too long to run. That's why we juxtapose it with objective measures like throughput. None of these measures are perfect individually or collectively, but they help us get closer and more aligned on how to talk about and approach this.
Using benchmarks
Kent: One critique I have of Core Four is around interpreting results, especially comparisons. I heard the word “benchmark” used a lot. There's a huge difference between exploratory work and extraction work. If you don't recognize that context, the same number could be good or bad, or a change could be good or bad completely differently depending on where you are.
Abi: I think the best analogy is health metrics. Every year I get an advanced lipid panel with tons of numbers. Without benchmarks or guideline ranges, the numbers would mean nothing - that's the first purpose of benchmarks, giving you a frame of reference. Now, many of my numbers are above or below the range, and interpreting whether that's a problem or whether the range is even the right target for me requires analysis with my doctor.
It's the same with engineering metrics. The benchmarks aren't law - they're a frame of reference. We try to benchmark like-to-like: mobile teams work very differently than hardware teams or API teams. Another analogy: we have SMB and enterprise sales teams. We track the same metrics but would never compare them against each other because they're not doing the same deal sizes.
Implementing metrics
Kent: How much are you doing consulting versus selling tools?
Abi: We are very white-glove. Organizations want a solution to collect data and analyze it, but they equally want help developing their point of view on productivity. A lot of it is philosophy and change management and communication, just as much as providing dashboards.
Kent: I would go so far as to say if you just sell the dashboard, that's malpractice.
Abi: Yeah, it's like getting a lipid test but not being able to meet with a physician who can properly interpret it.
Kent: You have an educational challenge - an education business with tool support. That's how it has to be, because if you just sell the tools, it's going to turn bad. Look at the shift in the political climate toward the surveillance economy - that's the societal direction we're going. For creative work, that's a disaster.
Abi: I'd recharacterize it as selling strategy. Ultimately what we sell is understanding how to use and interpret numbers and implement them in your organization without screwing up. It's about knowing what changes to make, in what order, and when to stop investing in something.
Kent: I'm glad you're doing what you're doing. I think self-awareness is really valuable. I am afraid of the misuse of this kind of information. With my forest and desert analogy, I'm trying to communicate to executives that there is this other way of being with technological development that's better for everybody, but it's a different equilibrium than people are living in. I think you're going in that same direction but driven from a data and education perspective, while mine is more storytelling. I think what we're doing is really complementary.
Who’s hiring right now
This week’s featured job openings. See more open roles here.
Realtor is hiring a Head of Developer Productivity | Austin, TX
Amazon is hiring a Senior Programmer Writer - ASBX | Seattle and NYC
Pinterest is hiring a Senior PM - Infrastructure | San Francisco, CA
Lyft is hiring an Engineering Manager - DevEx | Toronto, Canada
The Hartford is hiring a Sr Staff Platform Engineer | Multiple cities, US
Thanks for reading. If you enjoyed this issue, consider sharing it.
Really enjoyed this article. It was concise and several great points. I have an overall question that I'd love to get advice on. I'm a Director of Software Engineering. I've read several articles, posts, etc. on developer productivity and the do's and don't's. What I can't seem to get good information or advice on is what exactly are good goals around productivity. Some leaders look at the 4 key DORA metrics and say something like...that's our goal, improve out DORA metrics. That to me falls right into the Goodhart's Law trap. But, I see why people would want to set a goal around them especially higher-level leaders. I see DORA metrics more as trailing indicators. If improving DORA metrics isn't the goal, what are some goal examples to explore? Reading this article, I would think something like a goal around reducing friction should be the goal (or one of a set of goals) and the DORA metrics should improve over time. So the goal/focus is reducing friction, not increasing DORA metrics. That leads me to my final question, what if we make material reductions in friction and the DORA metrics don't improve? Appreciate any advice and/or links to information. Thanks