Operationalizing developer productivity metrics

How to integrate metrics into teams’ decision-making processes.

and

Mar 26, 2025

Welcome to the latest issue of Engineering Enablement, a weekly newsletter sharing research and perspectives on developer productivity.

This week, DX’s CTO Laura Tacho shares a practical guide on how to take action on developer productivity metrics.

It’s a challenge many teams struggle with: They have the data, but they don’t know where to start. So some end up using metrics only for reporting and miss the opportunity to drive change. Other leaders unintentionally overwhelm teams with too much data.

This guide breaks down the core use cases for measurement, how to choose the right metrics, and how to use metric mapping to turn data into meaningful improvements.

Here’s Laura.

Developer productivity metrics are on everyone’s mind, and we’ve been hearing a lot of questions like “How do we know if we’re moving as fast as we can?” or “How can I measure the impact of tooling investments, like using GenAI for coding?” Right now, a lot of conversations about metrics are focused on what to measure. While I don’t believe that problem is perfectly solved, starting with a framework like the DX Core 4 helps you skip through a lot of the brute forcing and guessing as to what metrics really matter.

Now we can bring the conversation up the problem ladder and talk about another thorny problem: once you have metrics, what on earth are you supposed to do with them?

It’s very common for teams to struggle with interpreting and acting upon developer productivity metrics – even high-performing teams that effectively use data in other parts of their work. Using data for continuous improvement is a separate skill that needs support at the organizational level.

We frequently talk with organizations who say “we’ve just spent the last 6 months setting up DORA metrics, and now we’re trying to figure out what to do with them.”

When this happens, organizations often fall into the trap of:

Reverting to old habits – simply adding the metrics to leadership reports without driving real change.
Overwhelming teams with data – expecting teams to derive meaning from hundreds of measurements without providing adequate support or clear expectations.
Failing to connect metrics with decision-making – collecting data that sits unused in dashboards rather than influencing team behavior and strategy.

The key to making metrics useful is to integrate them into decision-making processes at every level—engineering teams, leadership, and platform teams—while ensuring that the right people are looking at the right data at the right time.

Use cases for metrics: engineering organizations and systems teams

There are two primary use cases for developer productivity metrics:

Engineering organizations – These teams use metrics to assess overall efficiency and drive continuous improvement at the organizational level. Leadership uses this data to guide transformation efforts, ensure alignment with business goals, increase quality, and improve engineering velocity. Teams use metrics to make daily and weekly decisions to improve their own performance and velocity.
Systems teams (Platform Eng, DevEx, DevProd) – These teams use metrics to understand how engineering teams interact with internal systems and to assess the ROI of DevEx investments. These metrics are existential for these teams: they need to show the impact of their work in order to measure the success of their investment. These measurements are also crucial for setting future priorities.

Understanding this use case will guide your approach to collecting the data (what metrics do I need to fulfill my use case?) as well as your approach to interrogating the data (what questions am I trying to answer, and for what purpose?).

Activities with metrics: diagnostics and improvement

Metrics have certain characteristics that allow them to be most useful in certain contexts. Some metrics help us see trends, while others can drive daily decisions at the team level.

Diagnostic metrics – These are high-level, summary metrics that provide insights into trends over time.
- Collected with lower frequency
- Benefit from industry benchmarks to contextualize performance
- Best used for directional or strategic decision-making
- Examples: DX Core 4 primary metrics, DORA metrics
Improvement metrics – These metrics drive behavior change.
- Collected with higher frequency
- Focused on smaller variables
- Are often in teams’ locus of control

Use this table for guidance on how to distinguish between a diagnostic and an improvement metric.

You may go to the doctor once a year and get a blood panel to look at your cholesterol, glucose, or iron levels. This is a diagnostic metric: meant to show you a high-level overview of your total health, and meant to be an input into other systems (like changing your diet to include more iron-rich foods).

From this diagnostic, more granular improvement metrics can be defined. Some people wear a Continuous Glucose Monitor to keep an eye on their blood glucose after their diagnostic test indicated that they should work on improving their metabolic health. This real-time data helps them make fine-tuned decisions each day. Then, we expect to see the sum of this effort reflected in the next diagnostic measurement.

For engineering organizations, a diagnostic measurement like PR Throughput can show an overall picture of velocity, as well as contextualizing your performance through the use of industry benchmarks. Organizations that want to drive velocity then need to identify improvement metrics that support this goal, such as time to first PR review. For example, they could get a ping in their team Slack to let them know when a new PR is awaiting review, or when a PR has crossed a threshold of time without an approval. These metrics are more granular and targeted, and allow the team to make in-the-moment decisions to drive improvement.

Metric mapping

You can get from a bigger-picture diagnostic metric to an actionable improvement metric through a process called metric mapping.

Start with your diagnostic metric, for example, Change Failure Rate.
Think about the boundaries of this metric. What is the big idea that the metric is trying to capture? What are the starting and ending points of any processes it measures, and does it include any sub-processes? What areas of your system would need to improve in order to influence this metric? What do developers think about it?
The answers to these questions will give you smaller, more actionable measurements that are easier for teams to reason about, and more likely to be within a team’s locus of control, or the area where they have autonomy and influence.

Let’s use Change Failure Rate as an example.

What is the big idea the metric is trying to capture? Software quality
What are the starting and ending points of any processes it measures, and does it include any sub-processes? CFR is the result of a few different processes - local testing workflows, CI/CD, QA (if any), telemetry, deployment processes, and is influenced by batch size, build speed, test flakiness, etc.
What areas of your system would need to improve in order to influence this metric? We know that our CI processes are slow and unreliable. We also work on really big changes most of the time, and we know that bigger changes are riskier to deliver.
What do developers think about the big idea? We can measure satisfaction with software quality to see if we’re heading in the right direction with all of these other interventions.

The hypothesis? If this team reduces batch size, improves CI flakiness, and increases satisfaction with quality practices, then Change Failure Rate will decrease. The improvement metrics give teams a clearer picture of where to focus.

Getting started

Identify your use case and activities

When approaching data, you need to ask yourself:

Is this data meant for an engineering organization trying to improve, or is this data for a platform engineering team assessing the impact of their work? This is your use case.
Is this data meant to show me high-level trends, and be useful in a report-card style report? Or is it meant to be zoomed-in and granular, focusing on a specific action or decision? This is your activity, either diagnosing or improving.

If you're leading an engineering organization looking to improve efficiency, focus on diagnostic metrics first to identify key problem areas, then use improvement metrics to guide day-to-day actions.

If you're on a platform team, use diagnostic metrics to demonstrate success and adoption, and to identify new opportunities for impact. Then, use improvement metrics to iterate on internal tools and processes.

Set expectations for how metrics will be used

“If you build it, they will come” doesn’t apply here. Don’t assume that making metrics available will lead to action. Without clear expectations and a system of accountability, it’s easy for metrics and continuous improvement to take a backseat to delivery pressures.

Pressurize the system: Senior leadership should emphasize the importance of metrics in evaluating success and setting priorities. Microsoft’s Edge Thrive initiative, for example, ensures that engineering leaders are accountable for their productivity metrics, which trickles down to teams.
Integrate metrics into organizational workflows: Use metrics in all areas of the business, like retrospectives, planning meetings, and all-hands meetings. Leaders should be talking about these metrics at every opportunity. Even if you feel like a broken record, it's important to keep the message consistent and top of mind. For example, if your teams are trying to decide whether to prioritize improvements to code coverage or build a new feature, looking at a quality metric like Change Failure Rate can guide the discussion.

Make change the goal

Metrics are only useful if they lead to action. To ensure metrics drive change:

Tell a story with data: Rather than presenting raw numbers, frame metrics in the context of progress toward key business goals.
Use industry benchmarks for context: Comparing your organization’s metrics to industry benchmarks can help make data actionable. You can download the full set of DX Core 4 benchmarks here.
Mix qualitative and quantitative data: Looking at quantitative data from systems can tell you what is happening, but only self-reported data from developers can tell you why. For improvement, the “why” is critical.

By structuring your approach using the dimensions of use case (engineering org vs. platform teams) and activity type (diagnostic vs. improvement), you can ensure that data is driving meaningful change rather than becoming an overwhelming reporting exercise.

Next time you find yourself wondering, "Now what?" after collecting developer productivity metrics, ask:

Who is this data for?
Are we diagnosing or improving?
How will this data be used in decision-making?

By answering these questions, you can move from data collection to real impact, making your developer productivity metrics truly useful.

That’s it for this week. Thanks for reading. If you enjoyed this issue, please consider sharing it:

-Abi

Engineering Enablement