Advocating for qualitative metrics
An excerpt from an article I co-authored with Tim Cochran for Martinfowler.com.
This is the latest issue of my newsletter. Each week I share research and perspectives on developer productivity.
Join us on April 3rd: Laura Tacho and Nathen Harvey are hosting a discussion on anti-patterns and best practices for using the DORA metrics.
Today’s newsletter is an excerpt from an article I co-authored with Tim Cochran for Martinfowler.com. If you’re exploring metrics for developer productivity, this article should be helpful: it outlines some key limitations of traditional metrics and explains how qualitative metrics can be used to provide the insights that leaders need. You can read the article here.
We’ve seen firsthand how qualitative metrics can dramatically enrich leaders’ understanding of developer productivity, especially those who are coming from relying on quantitative metrics. However, there are some common misconceptions about qualitative metrics, which we address in the excerpt shared here.
What do we mean when we say “qualitative metrics”? Our definition: Qualitative metrics are measurements comprised of data provided by humans.
Advocating for qualitative metrics
Executives are often skeptical about the reliability or usefulness of qualitative metrics. Even highly scientific organizations like Google have had to overcome these biases. Engineering leaders are inclined toward system metrics since they are accustomed to working with telemetry data for inspecting systems. However, we cannot rely on this same approach for measuring people.
We’ve seen some organizations get into an internal “battle of the metrics” which is not a good use of time or energy. Our advice for champions is to avoid pitting qualitative and quantitative metrics against each other as an either/or. It’s better to make the argument that they are complementary tools.
We’ve found that the underlying cause of opposition to qualitative data are misconceptions which we address below. Later in this article, we outline the distinct benefits of self-reported data such as its ability to measure intangibles and surface critical context.
Misconception: Qualitative data is only subjective
Traditional workplace surveys typically focus on the subjective opinions and feelings of their employees. Thus many engineering leaders intuitively believe that surveys can only collect subjective data from developers.
As we describe in the following section, surveys can also capture objective information about facts or events. Google’s DevOps Research and Assessment (DORA) program is an excellent concrete example.
Some examples of objective survey questions:
How long does it take to go from code committed to code successfully running in production?
How often does your organization deploy code to production or release it to end users?
Misconception: Qualitative data is unreliable
One challenge of surveys is that people with all manner of backgrounds write survey questions
with no special training. As a result, many workplace surveys do not meet the minimum standards needed to produce reliable or valid measures. Well designed surveys, however, produce accurate and reliable data (we provide guidance on how to do this later in the article).
Some organizations have concerns that people may lie in surveys. Which can happen in situations where there is fear around how the data will be used. In our experience, when surveys are deployed as a tool to help understand and improve bottlenecks affecting developers, there is no incentive for respondents to lie or game the system.
While it’s true that survey data isn’t always 100% accurate, we often remind leaders that system metrics are often imperfect too. For example, many organizations attempt to measure CI build times using data aggregated from their pipelines, only to find that it requires significant effort to clean the data (e.g. excluding background jobs, accounting for parallel jobs) to produce an accurate result.
Benefits of qualitative metrics
One argument for qualitative metrics is that they avoid subjecting developers to the feeling of “being measured” by management. While we’ve found this to be true – especially when compared to metrics derived from developers’ Git or Jira data – it doesn’t address the main objective benefits that qualitative approaches can provide.
There are three main benefits of qualitative metrics when it comes to measuring developer productivity:
Qualitative metrics allow you to measure things that are otherwise unmeasurable
System metrics like lead time and deployment volume capture what’s happening in our pipelines or ticketing systems. But there are many more aspects of developers’ work that need to be understood in order to improve productivity: for example, whether developers are able to stay in the flow or work or easily navigate their codebases. Qualitative metrics let you measure these intangibles that are otherwise difficult or impossible to measure.
An interesting example of this is technical debt. At Google, a study to identify metrics for technical debt included an analysis of 117 metrics that were proposed as potential indicators. To the disappointment of Google researchers, no single metric or combination of metrics were found to be valid indicators (for more on how Google measures technical debt, listen to this interview).
While there may exist an undiscovered objective metric for technical debt, one can suppose that this may be impossible due to the fact that assessment of technical debt relies on the comparison between the current state of a system or codebase versus its imagined ideal state. In other words, human judgment is essential.
Qualitative metrics provide missing visibility across teams and systems
Metrics from ticketing systems and pipelines give us visibility into some of the work that developers do. But this data alone cannot give us the full story. Developers do a lot of work that’s not captured in tickets or builds: for example, designing key features, shaping the direction of a project, or helping a teammate get onboarded.
It’s impossible to gain visibility into all these activities through data from our systems alone. And even if we could theoretically collect all the data through systems, there are additional challenges to capturing metrics through instrumentation.
One example is the difficulty of normalizing metrics across different team workflows. For example, if you’re trying to measure how long it takes for tasks to go from start to completion, you might try to get this data from your ticketing tool. But individual teams often have different workflows that make it difficult to produce an accurate metric. In contrast, simply asking developers how long tasks typically take can be much simpler.
Another common challenge is cross-system visibility. For example, a small startup can measure TTR (time to restore) using just an issue tracker such as Jira. A large organization, however, will likely need to consolidate and cross-attribute data across planning systems and deployment pipelines in order to gain end-to-end system visibility. This can be a yearlong effort, whereas capturing this data from developers can provide a baseline quickly.
Qualitative metrics provide context for quantitative data
As technologists, it is easy to focus heavily on quantitative measures. They seem clean and clear, after all. There is a risk, however, that the full story isn’t being told without richer data and that this may lead us into focusing on the wrong thing.
One example of this is code review: a typical optimization is to try to speed up the code review. This seems logical as waiting for a code review can cause wasted time or unwanted context switching. We could measure the time it takes for reviews to be completed and incentivize teams to improve it. But this approach may encourage negative behavior: reviewers rushing through reviews or developers not finding the right experts to perform reviews.
Code reviews exist for an important purpose: to ensure high quality software is delivered. If we do a more holistic analysis – focusing on the outcomes of the process rather than just speed – we find that optimization of code review must ensure good code quality, mitigation of security risks, building shared knowledge across team members, as well as ensuring that our coworkers aren’t stuck waiting. Qualitative measures can help us assess whether these outcomes are being met.
Another example is developer onboarding processes. Software development is a team activity. Thus if we only measure individual output metrics such as the rate new developers are committing or time to first commit, we miss important outcomes e.g. whether we are fully utilizing the ideas the developers are bringing, whether they feel safe to ask questions and if they are collaborating with cross-functional peers.
Who’s hiring right now
Here’s a roundup of new Developer Experience job openings:
Asana is hiring a Senior Software Engineer - Developer Efficiency | New York City
Netflix is hiring a Technical Program Manager (L5/L6) - Platform | Remote (US)
Plaid is hiring a Product Manager - Developer Platform | San Francisco
Rocket Money is hiring a Team Leader - Engineering Performance & Developer Experience | Various cities or remote (US)
Uber is hiring a Senior Staff Engineer - Developer Platform (Gen AI) | San Francisco, Seattle
Find more DevEx job postings here.
That’s it for this week. If you know someone who might enjoy reading this issue, please consider sharing it:
-Abi