Measuring Developers’ Jobs-To-Be-Done

A shift in measurement makes it easier for Google’s developer tooling teams to understand their impact and plan their roadmaps.

Sep 06, 2024

This is the latest issue of Engineering Enablement, a weekly newsletter covering the data behind world-class engineering organizations. To get articles like this in your inbox every Friday, subscribe:

Sign up: Laura Tacho and I are hosting a discussion about the DX Core 4, a new framework for reporting on engineering productivity that encompasses DORA, SPACE, and DevEx. Register to join us here.

This week I read Measuring Developer Goals, the latest in Google's Developer Productivity for Humans series. It explains how internal tooling teams can drive productivity improvements by focusing on what developers are trying to achieve, rather than just measuring the tasks they perform.

My summary of the paper

Google used to measure how well developer tools worked by evaluating how they supported certain tasks, like "debugging" or "writing code." However, this approach often lacked specificity that would be useful for tooling teams. For instance, "searching for documentation" is a common task, but the reason behind it—whether it's to "explore technical solutions" or "understand the context to complete a work item"—can meaningfully change a developer's experience and how well tools support them in achieving their goal.

To provide better insights, Google researchers identified the key goals developers are trying to achieve in their work and developed measurements for each goal. In this paper, they explain their process and share an example of how this new approach has benefited their teams.

Identifying developers’ goals

The researchers set out to create a concise list of critical jobs-to-be-done that they could track and measure. They established several criteria that each goal needed to meet:

Each goal needed to be durable, meaning that it should be as relevant five years from now as it is now.
Each goal needed to make sense to developers. They knew the goals were going to be measured using a survey, so having goals that developers could understand and see as part of their workflows was critical.
Each goal needed to connect to observable developer behaviors. The researchers also knew that they were going to use system-based metrics to measure these goals, so they needed to be able to map them to specific developer actions that could be tracked.
Collectively, goals needed to be consistent in altitude or scope.
Goals also needed to be comprehensive. In other words, they should cover the software development lifecycle.

With these criteria in mind, the researchers then conducted an extensive and iterative process to identify critical developer goals:

They gathered a small group of subject matter experts to create an initial list of goals spanning all software development phases, ensuring focus and a target size of 30 goals.
Then they mapped the draft list of goals to historical data from the quarterly Engineering Satisfaction (EngSat) survey to ensure comprehensive coverage of development tasks.
They conducted cross-functional feedback and two rounds of user research. The first was a moderated card sort with six Google developers to evaluate goal clarity and organization, followed by a larger unmoderated card sort with 40 participants for further validation.
The process concluded with a final round of cross-functional feedback and cognitive user testing during the EngSat launch.

The researchers identified 30 developer goals that are both comprehensive and easy for the company’s developer tooling teams to apply.

They then mapped each goal to specific teams, to help ensure that each team's efforts are aligned with developers’ goals.

Measurement

Google developed a measurement system that tracks how developers work by analyzing the actions they take when they use different tools. This system helps them understand the steps developers go through to achieve their goals, like setting up a server or debugging code. It’s the product of several years’ worth of work, but still interesting to learn from.

One key point is that the system needs to be very precise about what actions signal the start, progress, and completion of a task. For example, a developer might have a goal like "deploy a server into production." Initially, this goal might seem straightforward, but when broken down, it involves several specific steps, like "identifying the necessary configuration settings." The system helps clarify these steps and tracks them accurately.

The system is also designed to be flexible. Developers often achieve their goals in different ways, and the system can track these various paths. For example, some developers might follow a golden path while others might not. The system doesn't just focus on the most efficient path but also captures data on why developers might choose other methods. This flexibility also allows the system to track how many attempts a developer makes to fix a failing test, rather than just measuring the time between the first failure and eventual success. This gives a fuller picture of the challenges developers face.

Combining system-based data with survey-based data

To get a deeper understanding, Google combines this system-based data with survey data that captures how well supported they feel for each one of the 30 identified goals. The combination fills in gaps and provides Google with a more thorough understanding of what actions need to be taken to improve developer productivity.

The researchers note that the real value in combining behavioral data with survey feedback is that it provides a more complete understanding of what developers need. This approach goes beyond just measuring surface-level metrics and helps uncover deeper insights that can guide product development, improve developer satisfaction, and ultimately drive business success.

Using goal-based measurement to improve the code review process

The authors share an example of how moving from task-based measurement to goal-based measurement had a positive impact.

Previously, Google had the survey question, "How well do the tools you use support you in the following tasks?" Where code review was one of 70 response options listed. However, as they shifted their focus to developer goals, they changed the question to be more specific. They asked: “How well did the developer tools you currently use at Google support you in the following developer activities?” with text options being the developer goals previously identified. For example, one text option was “Ensure the code contributed by others (e.g., teammates, AI, etc.) is high quality,” instead of previously being just “code review.”

This language change made the survey item more focused on developers’ goal—ensuring code that gets submitted is high quality—rather than just the task of completing a code review. As a result, survey scores shifted downward modestly, opening up questions for the developer tool team around how they could ensure internal tools are best supporting developers in writing and reviewing high-quality code.

They also combine these survey insights with data extracted from tools developers use in the code review process. By looking at both the survey responses and the tool usage data, they discovered that developers who review code more often (and do so in less time) feel less well supported than those who review code less often. This data prompted new studies around understanding the review loads and practices of these engineers who don’t feel well supported; with the aim of providing new features to support these engineers.

Final thoughts

This paper points out an important challenge related to measurement: developer experience surveys tend to surface the problem areas that developers face, but internal tooling teams need more specific answers about the impact their tools are having and what can be improved. Google solved this problem by making their survey questions more specific. Another approach is to use in-the-moment surveys in addition to periodic DevEx surveys.

Here’s how we describe this in DevEx: What Actually Drives Productivity:

LinkedIn is an example of a company that uses both periodic surveys (quarterly) and real-time feedback (in-the-moment surveys). Here’s how Max Kanat-Alexander described their approach in an issue of The Pragmatic Engineer:

This approach allows teams to start broad and identify the problem, then use in-the-moment surveys to capture more specific feedback.

Whatever approach you choose, the point is that internal tooling teams can benefit from having more specific insights about how their tools are supporting developers as they go about their jobs-to-be-done.

Who’s hiring right now

Here is a roundup of Developer Productivity job openings. Find more open roles here.

SiriusXM is hiring a Staff Software Engineer - Platform Observability | US
Roku is hiring a Senior Manager - Developer Tools | Cambridge
CarGurus is hiring a Principal Cloud Architect - Platform | Boston
Citi is hiring a Director - Engineering Excellence | Irving
Snowflake is hiring a Senior Engineer - Developer Productivity | Bellevue

That’s it for this week. Thanks for reading.

-Abi

Engineering Enablement