Measuring Developer Experience at Google

A close look at how Google’s Engineering Satisfaction survey works.

Jun 21, 2024

This is the latest issue of my newsletter. Each week I share research and perspectives on developer productivity.

Join this live discussion next week to hear advice for establishing a strong platform engineering function.

This week I read Measuring Developer Experience With a Longitudinal Survey, the newest installation in a research series from Google. This paper covers their Engineering Satisfaction survey, which Google has used since 2018 as a way to measure and understand developer productivity. The EngSat survey has evolved significantly over the years, and in this paper the authors describe the challenges they’ve seen and how they’ve refined their approach.

My summary of the paper

Developer surveys have several advantages: they’re flexible, fast, and can capture data on topics that are difficult or impossible to measure otherwise. For these reasons, they’re a good starting point for organizations to measure developer productivity. Organizations that are further along, like Google, use these insights in conjunction with system-based metrics to provide a complete picture of what developers are doing and what is affecting their work.

Here, the authors share how the EngSat survey started, key elements of the program, and some examples of how the results have been used:

How the EngSat survey started

The research team made some key steps and considerations when initially developing the survey:

1. They established a clear and unique goal for the survey. The researchers started by making sure the EngSat survey would capture insights not covered by other surveys or data sources like logs data or general employee surveys. They differentiated from employee engagement surveys by focusing on developer-specific tasks. They differentiated EngSat from logs data by focusing on topics that are typically difficult to measure (for example, technical debt, flow, or code quality) and simply not reflected in their logs data (for example, satisfaction or hindrances to productivity).

2. They collaborated with domain experts and researchers to develop an effective instrument. To ensure the survey would provide useful insights, the researchers partnered with subject-matter experts to develop and test the questions to ensure that they were representative of developer workflows. They also leveraged best practices for survey design, which they were able to do thanks to the skill sets already available on the research team.

3. They gathered stakeholder buy-in. The authors note that criticisms of surveys often emerge when results suggest something negative or counterintuitive: “Surveys often face the same criticisms regardless of quality, scale, or domain. For example, critics contend that surveys are subjective (for example, susceptible to mistakes and misinterpretations), biased (for example, respondents may have an incentive to respond inaccurately), and more reflective of how people feel than what is actually happening.” They worked to challenge these perceptions by pairing the survey results with logs-based measures and using the combined picture to reveal the value of bringing both data sources together.

Note: In an interview, two members of Google’s research team explained how they initially faced skepticism from leaders about using surveys. However, after running the EngSat program for some time, this skepticism diminished as the survey demonstrated its value.

Key elements of the program today

The EngSat survey has been conducted in the last two weeks of every quarter for over six years. The authors highlight the key factors that sustain the program:

Consistent and adequate staffing. There’s an established rotation so one UX researcher and one engineer jointly own the program each quarter. The UX researcher drives the management and evolution of the survey instrument, program operations, and communications. The engineer drives the management of the automated data processing and analysis infrastructure.
The process for preparing, running, and analyzing the survey. This process is illustrated in the image below. “The process (and its associated artifacts) make the program more efficient to execute, more consistent by design, more portable across staff, and helps us accrue improvements over time.”
The infrastructure and automation supporting the program. The research has developed a common infrastructure that enables them to clean and aggregate the survey data so they can be made available in dashboards and reports.

How results have been used

EngSat has given Google the ability to track changes over time. They give a few examples of how they’ve been able to use the data collected from EngSat:

1. Tracking the impact of the pandemic. During COVID-19, using the EngSat survey, Google identified a notable decrease in productivity between Q1 2020 and Q2 2020 as many people began working from home for the first time. Insights from EngSat also allowed them to motivate improvements to remote access and connectivity, which were key pain points. Over time, they saw productivity rebound and even exceed pre-pandemic levels.

2. Reducing technical debt. In 2019, Google highlighted technical debt as a top hindrance to productivity, which motivated various efforts across Google aimed at reducing technical debt. They partnered with internal teams to develop best practices, management plans, and initiatives aimed at addressing technical debt. Since the dissemination of these resources company-wide, they’ve seen large improvements in the technical debt sentiment on EngSat.

3. Validating logs-based metrics. The research team uses the EngSat survey to develop and validate new logs-based metrics. For example, when they wanted to create a measure to track when developers experience flow or focused work, they started by interviewing developers who reported different frequencies of achieving flow or focused work on a recent EngSat survey. Then, after they established a logs-based approach to measuring flow or focused work, they validated it with EngSat data to ensure that the metric was representative of self-reported experiences. Additionally, they use the survey to identify which factors predict higher instances of flow or focused work.

Maintaining the health of the program

There are two primary problems that are likely to affect any survey program: 1) increasing survey length and 2) decreased response rate. Here’s how Google has handled both issues:

Increasing survey length is natural: it’s easier to add questions than take them away.

The researchers identified a subset of questions that they have committed to keeping consistent. These are high-level questions around productivity, satisfaction, velocity, quality, and flow that they see value in consistently measuring.
Additionally, they occasionally run major streamlining initiatives aimed at reducing duplicative or obsolete questions.

Decreased response rates can happen when developers don’t see what happens with their responses. This is an important problem to solve, however, because the program is dependent on getting a sufficient amount of data to be representative and reliable. To address this, Google 1) uses sampling and 2) has emphasized transparency and accountability in their reports.

Sampling: The developer population is split into three random cohorts, with one cohort being surveyed each quarter. Grouping into three cohorts instead of four means that individual cohorts are surveyed in different quarters of each year. Sampling in this manner reduces the sample size but enables them to ask engineers to respond to a survey less frequently and still study the cohorts over time. Google has a large enough engineering population that splitting the sample in this way still yields a response count large enough to enable statistical power to detect significant differences over time.
Transparency: There is a widely distributed summary report of EngSat each quarter. Anyone at Google is able to see what is being measured and how we are acting on their feedback. Reporting and sharing aggregated data and impact in this manner encourages developers to take the survey year after year.

Recently, the research team made a significant change to the EngSat survey by removing specific questions aimed at smaller developer groups. This was a tough decision, but they opted for more general questions to support company-wide decisions better. These new questions focus on five key outcomes and four additional themes that drive these outcomes. This shift ensures the survey remains relevant and useful at a broader organizational level.

Final thoughts

Surveys are excellent for understanding developer productivity, but it's important to recognize that an effective program can require significant effort if managed in-house. The authors conclude with a summary of their advice for others looking to run a survey program:

Establish a clear and unique goal for your survey (ensure that there are no other survey programs or data sources that you can already use).
Collaborate with domain experts and researchers to develop an effective instrument.
Gather stakeholder buy-in (for example, communicate the value of survey data and partner with teams that can act on your results).
If you are planning to run a longitudinal survey, invest time in the beginning in setting up the right infrastructure, process, templates, and documentation.
Maintain the health of your survey program (control survey length and sample strategically).
Be transparent and accountable. The quality of your insights depends on the feedback provided by your developers, so make it clear why they should spend their time on it.

Google has been generous in sharing how they're measuring and improving engineering productivity. This paper gives an excellent blueprint of their EngSat survey while offering useful advice (e.g., “make reports transparent” and “ensure consistent staffing”) that others can use to create effective developer survey programs within their organizations.

Who’s hiring right now

Here is a roundup of DevEx job openings. Find more open roles here.

Netflix is hiring a Product Manager - Developer Platform | US
Plaid is hiring a Product Manager - Developer Platform | US
Snyk is hiring a VP, Engineering - Developer Experience | Boston, London
Webflow is hiring an Engineering Manager - Developer Productivity | US
VTEX is hiring a Engineering Manager - Developer Experience | Brazil

Thanks for reading. If you know someone who might like this issue, consider sharing it with them:

-Abi

Engineering Enablement

Discussion about this post