Measuring via Surveys Versus Systems
Comparing two approaches for measuring your software delivery pipeline.
This week I read DevOps Metrics: Your Biggest Mistake Might Be Collecting the Wrong Data, a paper by Nicole Forsgren and Mik Kersten. This paper outlines the advantages of using surveys to capture metrics about software development tools and processes, as opposed to capturing metrics solely from systems (e.g., scraping data from a tool like GitHub).
Note: This week I announced that Dr. Nicole Forsgren joined the DX team. You can watch my interview with her to learn more about her perspectives on the advantages of survey-based measurement.
My summary of the paper
Managing and improving any process or system requires insights into that system. But collecting measurements on software delivery is difficult.
There are two separate but complementary approaches to measuring software delivery:
Survey data. Survey measures and techniques provide a holistic and periodic view of the value stream.
System data. Tool-based data provides a continuous view of the value stream, although limited by what can be feasibly collected and correlated.
Neither system data alone nor survey data alone can measure the effectiveness of a modern software delivery pipeline. Yet, some organizations criticize survey-based data wholesale and instead attempt to measure using system data alone (e.g., creating metrics based on data stored in their repositories). These organizations may not understand the limitations of this methodology.
System metrics are limited by the data you are able to collect and correlate from different sources. In large organizations with many different teams and tools, achieving end-to-end cross-system visibility can be a long journey as you first need to deploy a measurement solution across systems, and then make sure that cross-system integration is in place so that the data can be properly correlated.
In the absence of complete system measurements, surveys can provide a holistic view of a system relatively quickly (i.e., within several weeks). Then, as you become more and more instrumented with system metrics, you can continue using survey-based metrics to augment and validate system metrics, as well capturing data that's uniquely suited for survey methods (e.g., culture measures).
Comparing survey-based metrics and system metrics
The paper provides a detailed breakdown of both the advantages and challenges of each method of measurement, consolidated into the table below:
We can see in the table there are many aspects to consider when determining whether to use survey-based or system-based metrics. Here are a couple of points that I thought were most interesting:
Survey data provides a more complete picture
System-based data can only tell you about what happens inside the systems that you have fully instrumented. The authors note that “eventually, system-level data can provide a relatively full view of your system, but this requires full instrumentation, plus correlation across measures and maturity in reporting and visualization techniques so that teams can understand system state. This is a nontrivial task.”
Surveys, on the other hand, are “particularly good at providing a complete and holistic view of systems. This is because it can capture information about systems, process, and culture.” Survey design is an important element of getting a comprehensive view of a system.
System data provides higher-precision data
System-based measures can accurately show minute, second, and millisecond differences. But this is often impossible with surveys because humans aren’t good at answering with precision. For example, humans can reliably answer whether something happens once a day, week, or month, but not whether something happens in two seconds or three seconds.
“When you ask about deployment frequency, your survey options increase in log scale: people can generally tell you if they are deploying software on demand, weekly, monthly, quarterly, or yearly.”
Final thoughts
When I first read this paper, it struck me that surveys are a vastly underutilized measurement tool in our fields (as opposed to other fields such as healthcare, psychology, or economics).
I have recently been fascinated by how few proponents of the “DORA metrics” know that DORA used surveys to capture these metrics, both in their research and in their benchmarking solution for companies. During my time at GitHub, I encountered many large organizations that were investing millions of dollars on data infrastructure to report on the “DORA metrics”, when action baselines could have been captured much more easily using surveys.
This paper also reminds me of my recent conversation with Dr. Margaret-Anne Storey about the SPACE framework where she emphasized the importance of defining the goals of measurement before deciding on what metrics to use or how to capture them. I hope that this paper can be a helpful reference for organizations embarking on the journey of measurement.
That’s it for this week! If you know someone who would enjoy this newsletter, please consider sharing it with them.
As always, reach out with thoughts about this newsletter on Twitter at @abinoda, or reply to this email.
-Abi