Engineering Enablement: Engineering Enablement Podcast

Assumptions as code: SiriusXM’s approach to platform prioritization

Justin Reock — Fri, 10 Apr 2026 15:01:49 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this episode, I’m joined by Eleanor Millman, Senior Staff Product Manager, and Mina Tawadrous, Associate Director of Platform Engineering at SiriusXM, to discuss how platform teams can scale prioritization without relying on revenue.

We talk through how SiriusXM moved beyond RICE to build a custom framework for internal platforms, using weighted factors like developer speed, reliability, cost, and trust to guide decisions across teams.

We also explore their concept of “assumptions as code,” in which teams store and reuse assumptions in a central repository to reduce misalignment and improve decision-making, with AI helping to surface and validate those assumptions.

We close with how this system is shaping SiriusXM’s 2026 prioritization approach and what it signals about a broader shift toward builder-driven product development.

Some takeaways:

Prioritization breaks without a shared system

Prioritization does not scale naturally across teams. What works for one team breaks down at the org level with multiple stakeholders and competing requests
Platform teams lack a clear revenue signal. Unlike product teams, they must prioritize based on indirect impact
A shared framework aligns decisions. Without it, prioritization defaults to local optimization and noise

RICE is a starting point, not a solution

Standard frameworks miss key dimensions for platform teams. Urgency and indirect impact are not captured well
“Impact” needs to be decomposed. SiriusXM broke it into developer speed, reliability, cost, security, and more
The framework must evolve over time. Iteration was critical to making it useful in practice

Weighting forces real tradeoffs

You cannot prioritize everything at once. Increasing one dimension (like cost) necessarily deprioritizes others
Assigning weights makes decisions explicit. Leaders must commit to what matters this quarter
The output drives alignment across teams. A single prioritized list reduces cross-team conflicts

Data and conversation work together

The framework creates a place to attach data. Metrics like reliability scores inform prioritization decisions
Disagreements surface quickly. Teams can see where assumptions or inputs differ
Conversations, not just scores, drive alignment. The value comes from debating inputs, not just ranking outputs

Assumptions are the real bottleneck

Most disagreements come from hidden assumptions. Teams often believe they are aligned when they are not
Assumptions can be conflicting, invisible, or stale. All three create friction in decision-making
Making assumptions explicit improves clarity. It becomes easier to validate or challenge them

Storing assumptions as code scales learning

Assumptions are stored in a central repository. User research and data become reusable across teams
This reduces duplicated effort. Teams don’t need to rediscover the same insights repeatedly
It creates a shared source of truth. Assumptions become visible, versioned, and easier to update

In this episode, we cover:

(00:00) Intro

(01:17) Mina’s role and path into platform engineering

(02:03) Eleanor’s background and shift into product

(03:15) Scaling prioritization across platform engineering teams

(05:41) Aligning platform priorities with stakeholders

(09:08) Evolving RICE into a platform-specific prioritization framework

(11:33) Iterating on the prioritization framework over time

(16:57) How the framework, data, and conversations drive alignment

(19:06) Storing assumptions as code in a central repository

(26:47) Resolving assumption conflicts with user interviews

(30:47) How stored assumptions integrate with AI workflows

(35:30) Standard mode and different user personas

(37:20) The industry shift towards builders

(41:04) The challenges of platform engineering

(43:36) How SiriusXM is prioritizing in 2026

Referenced:

• Measuring AI code assistants and agents

• SiriusXM

• VMware

• How SiriusXM revamped their platform and developer experience

• RICE Scoring Model | Prioritization Method Overview

• The evaporating cloud: A tool for resolving workplace conflict

Measuring AI impact, assessing readiness, and new data trends

Abi Noda — Fri, 03 Apr 2026 14:14:08 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this special episode of Engineering Enablement, I welcome back Jesse Adametz, this time as host.

In our conversation, we explore how AI is showing up across the SDLC, not just in code generation, and how it is shifting bottlenecks across the development process. We unpack what “AI readiness” actually means in practice, and why it often comes down to developer experience fundamentals like documentation, environments, and feedback loops.

We also discuss why enablement matters more than tool choice, how teams are thinking about measuring ROI, and what changes as background agents become more common. Finally, we explore how the role of the engineer may evolve, what questions teams are still trying to answer, and the challenges of non-engineers contributing to codebases.

Some takeaways:

AI is expanding beyond coding into the full SDLC

The focus has shifted from code generation to the entire software lifecycle. Teams are applying AI to planning, prototyping, review, and documentation—not just writing code.

AI readiness is a developer experience problem

The biggest blockers to AI adoption are long-standing DX gaps. Missing documentation, inconsistent environments, weak CI, and unclear system boundaries all limit effectiveness.
Tool choice is not the primary driver of success. Models and tools are evolving too quickly for this to be a durable advantage.

Some organizations are formalizing AI enablement as a function. Dedicated teams are emerging to drive adoption and share practices.

Measuring AI ROI is messy and still evolving

Correlation vs causation makes attribution difficult. High AI usage often correlates with already high-performing engineers.
Longitudinal analysis is more reliable than snapshots. Tracking changes over time gives better insight into impact.
Token spend introduces real cost considerations. AI creates a direct, variable cost that organizations must evaluate.

AI impact falls into two buckets: amplification and augmentation

Amplification improves human productivity. This includes higher throughput, time savings, and better developer experience.
Augmentation extends capacity beyond humans. Agents begin to act as additional “headcount,” completing work independently.
These require different measurement approaches. Amplification focuses on human output, while augmentation focuses on agent output relative to cost.

Background agents shift how work gets done and where bottlenecks appear

Agents enable work to happen outside the human loop. Tasks can be completed asynchronously and proactively.
This changes the developer role. Engineers move toward reviewing, guiding, and orchestrating agent output.
Human workflows can become the bottleneck. If agents produce work faster than humans can process it, the constraint shifts.
This reframes productivity. The question becomes where human involvement adds the most value.

Specs and documentation are becoming critical infrastructure

AI makes documentation a core dependency. It directly impacts the quality of outputs.
Poor documentation leads to poor results. Agents can duplicate systems or make incorrect assumptions without context.
Documentation is shifting from optional to essential. It is now foundational for both human and AI productivity.

In this episode, we cover:

(00:00) Intro

(02:12) Where AI is showing up across the SDLC

(05:53) AI readiness and its link to developer experience

(08:23) Why enablement, education, and experimentation matter more than tool choice

(13:05) The case for a dedicated enablement team

(14:50) Measuring AI ROI: challenges and tradeoffs

(19:46) Background agents and token spend

(24:12) Measuring agent output with PR throughput

(26:58) How the engineer role might change

(31:01) Specs and documentation in the age of AI

(33:11) Non-engineers writing code

(35:30) What’s changing in the SDLC and open questions

Referenced:

• Measuring AI code assistants and agents

• Lessons from Twilio’s multi-year platform consolidation

• The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win

• How Claude remembers your project - Claude Code Docs

• specIsJustCode : r/ProgrammerHumor

Scaling developer experience across 1,000 engineers at Dropbox

Laura Tacho — Fri, 06 Feb 2026 18:40:25 GMT

Listen and watch now on YouTube, Apple, and Spotify.

Developer productivity is often treated as a tooling problem or a sentiment problem. In reality, it’s neither. It’s a socio-technical systems problem that spans engineering foundations, leadership alignment, organizational design, and culture.

In this episode, I’m joined by Uma Namasivayam, Senior Director, Engineering Productivity at Dropbox, to explore how Dropbox approaches developer experience at scale. We talk about why productivity needs to be framed as a business problem, how executive alignment creates the conditions for meaningful change, and what it takes to treat developer experience as a real product with developers as customers.

We also dig into Dropbox’s approach to AI adoption. Uma shares why strong foundations, such as build, test, and observability, are prerequisites for AI to actually accelerate work, how Dropbox encourages daily AI use without mandating a single tool, and where build-versus-buy decisions break down at scale.

We close with an honest look at what remains unsolved: how to connect gains in developer productivity and AI-driven capacity to real business outcomes, and where engineering leaders should focus next in 2026.

Some takeaways:

Developer productivity is a socio-technical problem

Productivity cannot be solved through tooling alone; it spans engineering systems, leadership behavior, organizational structure, and people practices.
Problems like build and test are engineering problems, while problems like focus time and interruptions are people problems, and both matter equally.
Treating productivity as a system forces tradeoffs to be explicit, rather than hidden inside isolated tooling initiatives.

Executive alignment matters more than any single metric

Top-down sponsorship creates permission to act, especially when productivity work cuts across org boundaries.
A shared framework creates alignment, not answers; its value is giving leaders and engineers a common language.
System metrics matter more than single metrics, because productivity improvements rarely move one dimension in isolation.
Distributed accountability makes productivity a company problem, not a developer experience team problem.

Developer experience works best when treated as a product discipline

Developers are customers, and their experience must be understood through both qualitative feedback and quantitative signals.
Good system metrics do not guarantee good developer experience, which is why sentiment and perception matter.
DX surveys surface where systems break differently for different teams, such as desktop, mobile, and web developers.
Continuous feedback loops are essential, combining surveys, direct conversations, and usage data.
Internal communication is part of the product, reinforcing to developers that their feedback leads to real change.

Prioritization requires structure, not intuition

Finite capacity makes prioritization unavoidable, even in large, well-resourced engineering orgs.
Segmenting developer populations clarifies tradeoffs, since different teams experience different bottlenecks.
DX survey data provides a defensible starting point, but prioritization still requires judgment.
Leadership-level stack ranking helps resolve conflicts, especially when multiple teams compete for attention.
Frameworks make hard decisions easier to explain, even when they do not make them easy.

AI and developer experience must advance in parallel

AI accelerates work, while developer experience reduces friction, and both are required for sustained gains.
Foundational systems act as plumbing, enabling trust in speed, quality, and safety.
Without strong CI, testing, and observability, faster code creation increases risk instead of value.
Trust in guardrails enables confidence in AI-assisted development, especially at scale.

AI adoption succeeds through choice, not mandates

Early organic adoption revealed real developer needs, rather than forcing a single tool.
Different teams require different AI tools, particularly for mobile, desktop, and large-repo workflows.
Supporting multiple tools increased adoption, rather than reducing it.
Daily use depends on fitting AI into existing workflows, not adding extra steps.
Habits matter more than access, which is why SDLC-level integration is critical.

Build vs. buy decisions change at scale

Many AI tools fail when tested at large-company scale, despite working well in smaller contexts.
Cost and performance become gating factors, not feature completeness.
Internal platforms can abstract complexity, enabling teams to build AI workflows safely and consistently.
Shared internal platforms unlock reuse, allowing teams to innovate without rebuilding infrastructure.
Speed of iteration remains the primary differentiator, even when building in-house.

In this episode, we cover:

(00:00) Intro

(00:45) Dropbox’s engineering org

(01:59) Why developer productivity is a business problem

(04:08) The role of executive sponsorship in developer productivity

(06:02) How DX’s Core Four framework created a shared language

(08:13) Treating developer experience as a product

(11:30) How Dropbox prioritizes developer experience work

(14:20) The challenge of tying developer experience to business outcomes

(16:38) How AI and developer experience intersect at Dropbox

(18:35) The prerequisites for AI adoption to accelerate work

(20:26) How Dropbox encourages daily AI use

(23:12) AI use beyond code completion

(25:00) Managing AI tool demand at scale

(27:56) Early results from Dropbox’s AI efforts

(30:05) Progress on developer experience at Dropbox

(32:55) Advice for organizations investing in developer experience

(34:25) Capacity tradeoffs for developer experience

(35:59) The unanswered questions around AI and capacity in 2026

Referenced:

• DX Core 4 Productivity Framework

• Dropbox.com

AI and productivity: A year-in-review with Microsoft, Google, and GitHub researchers

Abi Noda — Mon, 29 Dec 2025 17:17:45 GMT

Listen and watch now on YouTube, Apple, and Spotify.

As we close out 2025, I wanted to step back and take stock of what we have actually learned about AI adoption in engineering organizations. Not just where usage has increased, but where impact is real, where it is overstated, and what questions remain unanswered.

In this special year-end episode, I’m joined by Brian Houck from Microsoft, Collin Green and Ciera Jaspan from Google, and Eirini Kalliamvakou from GitHub. Together, we unpack the research each of them worked on this year and explore how leading organizations are thinking about AI measurement, developer experience, and long-term productivity. We talk candidly about why measuring AI’s impact is so difficult, why familiar metrics like lines of code keep resurfacing despite their flaws, and how multidimensional approaches like SPACE and DORA offer a more realistic lens.

We also look ahead to 2026. We discuss how AI is beginning to reshape the identity of the developer, how junior engineers’ skill sets may evolve, where agentic workflows are gaining traction, and why some of the most widely shared AI studies were misunderstood. This episode is an honest conversation about moving past hype and toward a more grounded, evidence-based approach to AI adoption in engineering teams.

Some takeaways:

Measuring AI impact requires multiple lenses

There is no single metric that can capture AI’s impact. Developer productivity and experience are inherently multidimensional, requiring trade-offs to be evaluated across speed, quality, collaboration, and meaning.
Frameworks like SPACE and DORA help avoid metric tunnel vision. They encourage teams to examine complementary signals rather than optimizing one dimension at the expense of others.
Measurement must reflect systems, not tools. AI does not operate in isolation; its impact depends on organizational context, workflows, and existing engineering practices.

Why familiar metrics keep failing us

Lines of code remains a deeply misleading metric. AI tends to generate verbose code, making raw output a poor proxy for productivity, quality, or long-term maintainability.
More code does not equal better outcomes. Excess code can increase maintenance burden, technical debt, and cognitive load over time.
Easy-to-measure metrics are often the most dangerous. Their simplicity makes them attractive during periods of uncertainty, even when they obscure what is actually changing.

The limits of tracking AI-generated code

Measuring the percentage of AI-generated code oversimplifies reality. AI may write, delete, refactor, or reorganize code in ways that raw percentages fail to capture.
AI-generated code does not inherently signal higher risk. In some contexts, AI output may be more consistent or higher quality than human-written code.
These metrics are better used as supporting signals, not goals. They can inform budgeting, experimentation, or adoption patterns but should not drive performance targets.

How AI is reshaping the role of the developer

Developers are shifting from implementers to orchestrators. Advanced AI users spend more time framing problems, setting context, and validating outcomes than writing raw code.
AI fluency is becoming a core skill. Knowing how to guide, correct, and collaborate with agents is increasingly important.
Adoption follows a progression. Developers tend to move from skepticism to exploration, collaboration, and eventually strategic use as expectations recalibrate.

What this means for junior engineers

Skill development may accelerate rather than disappear. Junior engineers may practice delegation, planning, and system-level thinking earlier by working with AI agents.
Technical fundamentals still matter. Understanding architecture, requirements, and failure modes remains essential for supervising AI-generated work.
Interpersonal skills risk being deprioritized. Managing agents is not the same as managing people, raising concerns about how collaboration skills develop over time.

AI is not just a productivity tool

Creativity and innovation benefit from friction. Research suggests that exposing decision points and seams can create space for new ideas rather than faster repetition.
Automating everything is not always desirable. Removing all toil may reduce opportunities for learning, insight, and creative problem-solving.
AI should augment thinking, not replace it. Tools that surface trade-offs and choices can support better outcomes than those that simply eliminate effort.

High-leverage AI use cases focus on toil

Developers spend only about 14% of their time writing code. Optimizing coding alone rarely leads to large productivity gains.
The biggest opportunities lie in removing friction. Documentation, compliance tasks, incident response, flaky tests, and knowledge discovery consistently rank as top pain points.
AI excels at work developers dislike but must still do. Automating dull, repetitive tasks can improve satisfaction and free time for meaningful work.

Why leadership and change management matter

AI adoption is a human problem before it is a technical one. Organizations that understand developer pain points deploy AI more effectively.
Agentic workflows amplify organizational differences. Teams with strong experimentation cultures and feedback loops move faster and with less friction.
Culture determines outcomes. How leaders communicate expectations, normalize experimentation, and support learning shapes whether AI adoption succeeds or stalls.

Looking ahead to 2026

Task parallelization is an emerging frontier. Developers are beginning to use agents to explore multiple solution paths simultaneously.
Collaboration with agents will redefine productivity. Teams, not just individuals, will increasingly work alongside AI systems.
Research must evolve with the work itself. New workflows will require new metrics, new telemetry, and new ways of understanding impact.

Lessons from the METR paper

Context matters more than headlines suggest. Results showing slower performance often reflected expert developers working in familiar codebases.
AI may help most where familiarity is lowest. New domains, unfamiliar systems, and onboarding scenarios show different outcomes.
Media oversimplification distorts understanding. Nuance is critical when interpreting AI research, especially as studies move into real-world environments.

In this episode, we cover:

(00:00) Intro

(02:35) Introducing the panel and the focus of the discussion

(04:43) Why measuring AI’s impact is such a hard problem

(05:30) How Microsoft approaches AI impact measurement

(06:40) How Google thinks about measuring AI impact

(07:28) GitHub’s perspective on measurement and insights from the DORA report

(10:35) Why lines of code is a misleading metric

(14:27) The limitations of measuring the percentage of code generated by AI

(18:24) GitHub’s research on how AI is shaping the identity of the developer

(21:39) How AI may change junior engineers’ skill sets

(24:42) Google’s research on using AI and creativity

(26:24) High-leverage AI use cases that improve developer experience

(32:38) Open research questions for AI and developer productivity in 2026

(35:33) How leading organizations approach change and agentic workflows

(38:02) Why the METR paper resonated and how it was misunderstood

Referenced:

• Measuring AI code assistants and agents

• Kiro

• Claude Code - AI coding agent for terminal & IDE

• SPACE framework: a quick primer

• DORA | State of AI-assisted Software Development 2025

• Martin Fowler - by Gergely Orosz - The Pragmatic Engineer

• Seamful AI for Creative Software Engineering: Use in Software Development Workflows | IEEE Journals & Magazine | IEEE Xplore

• AI Where It Matters: Where, Why, and How Developers Want AI Support in Daily Work - Microsoft Research

• Unpacking METR’s findings: Does AI slow developers down?

• DX Annual 2026

Running data-driven evaluations of AI engineering tools

Abi Noda — Fri, 12 Dec 2025 15:49:16 GMT

Listen and watch now on YouTube, Apple, and Spotify.

AI engineering tools are evolving fast. Every month brings new coding assistants, debugging agents, and automation capabilities. I want to help engineering leaders take advantage of that innovation while avoiding costly experiments that distract from real product work.

In this episode, Abi Noda and I share a practical, data-driven approach to evaluating AI tools. I walk through how to shortlist tools by use case, design structured trials that reflect real work, select representative participants, and measure impact using baselines and proven frameworks. My goal is to give you a way to test and adopt AI tools with confidence and a clear return on investment.

Some takeaways:

Data-driven evaluations are essential

Structured, measurable trials prevent bias. Without them, decisions are driven by novelty hype or a few loud voices.
Define a clear business outcome first (reduce toil, improve delivery speed, or raise code quality).
Evaluations must inform real decisions, not just check a procurement box.

Choose the right set of tools to evaluate

Group tools by use case and interaction mode (chat, agentic IDEs, code review assistants, etc.) to ensure fair comparisons.
Match shortlist size to org capacity to support multiple cohorts and reliable results.
Multi-vendor strategies reduce lock-in in a rapidly shifting market.

Re-evaluations are essential, not optional

Incumbent tools must be retested as capabilities evolve and new challengers emerge.
Triggers for re-evaluation include major feature launches, organic developer adoption of a new tool, and upcoming renewal cycles.
Every challenger tool evaluation requires a baseline of the incumbent, so you can compare like-for-like.
A cadence of every 8–14 months ensures decisions reflect the current reality, not the past purchase.

Design trials around research questions

Start with a hypothesis. It keeps experiments aligned to actual goals.
Developer sentiment is necessary but insufficient without measurable outcomes.
Success criteria must be defined in advance to avoid subjective decision-making.

Select representative participants

Diverse cohorts reveal real impact across languages, teams, and seniority levels.
Include skeptical and late adopters to uncover onboarding and enablement needs.
Volunteer-only trials distort results and won’t scale to full org rollout.

Run evaluations long enough to capture true behavior

Eight to twelve weeks is the minimum to get past the novelty phase and into sustained usage.
Align evaluation windows to procurement cycles so insights guide buying decisions.
Short trials lead to false signals and either inflate enthusiasm or create false negativity.

Use self-reported time savings carefully

Self-reporting is a strong early indicator of perceived usefulness.
Humans misremember time, often benchmarking against recent AI use.
Treat CSAT and time savings as directional, not the final truth.
Objective metrics validate real ROI, including throughput, quality, and innovation time.

Expect variation rather than a single winner

Different tools shine in different contexts, so multiple standards are often the best path.
Continuous re-evaluation is required as capabilities evolve every quarter.
The right goal isn’t the “best tool”, but the best tool for each problem space.

In this episode, we cover:

(00:00) Intro: Running a data-driven evaluation of AI tools

(02:36) Challenges in evaluating AI tools

(06:11) How often to reevaluate AI tools

(07:02) Incumbent tools vs challenger tools

(07:40) Why organizations need disciplined evaluations before rolling out tools

(09:28) How to size your tool shortlist based on developer population

(12:44) Why tools must be grouped by use case and interaction mode

(13:30) How to structure trials around a clear research question

(16:45) Best practices for selecting trial participants

(19:22) Why support and enablement are essential for success

(21:10) How to choose the right duration for evaluations

(22:52) How to measure impact using baselines and the AI Measurement Framework

(25:28) Key considerations for an AI tool evaluation

(28:52) Q&A: How reliable is self-reported time savings from AI tools?

(32:22) Q&A: Why not adopt multiple tools instead of choosing just one?

(33:27) Q&A: Tool performance differences and avoiding vendor lock-in

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• X: https://x.com/rhein_wein

• Website: https://lauratacho.com/

• Laura’s course (Measuring Engineering Performance and AI Impact): https://lauratacho.com/developer-productivity-metrics-course

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

• Substack: https://substack.com/@abinoda

Referenced:

DORA’s 2025 research on the impact of AI

Abi Noda — Fri, 21 Nov 2025 19:41:18 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this episode of Engineering Enablement, I sit down with Nathen Harvey, who leads research at DORA, to explore how teams should really think about measuring the impact of AI. We talk about why traditional delivery metrics can give leaders a false sense of confidence and how AI acts as an amplifier, accelerating healthy systems while intensifying existing friction and failure.

We examine findings from the 2025 DORA research on AI-assisted software development alongside DX’s Q4 AI Impact report and unpack where the data aligns and where meaningful gaps emerge. We also dig into how AI is reshaping engineering systems themselves, changing workflows, feedback loops, and team dynamics in ways leaders need to understand to achieve real, sustainable impact.

Some takeaways:

DORA metrics alone cannot measure AI impact

The four “key” DORA metrics only reflect delivery outcomes, not system behavior. They show where teams end up, not how they got there.
DORA now measures five software delivery performance metrics, not four.
These metrics function like a compass rather than a diagnostic tool.
Delivery performance metrics are leading indicators of organizational health but lagging indicators of engineering practices.

AI acts as an organizational amplifier

AI does not fix systems; it intensifies what already exists. Strong practices compound while weak practices become more painful.
Healthy teams experience faster flow while unhealthy systems accumulate more visible friction.
AI makes hidden bottlenecks impossible to ignore.

The five DORA software delivery performance metrics

DORA divides delivery performance into throughput and instability categories.
Throughput metrics include lead time for changes, deployment frequency, and failed deployment recovery time.
Instability metrics include change fail rate and deployment rework rate.

DX Q4 2025 AI Impact report insights

Junior engineers adopt AI more heavily than senior engineers. This shifts how work is distributed across teams.
Senior engineers often capture more measurable time savings despite lower visible usage.
DX found widespread experimentation with non-enterprise AI tools.
Engineers reported high AI usage even when enterprise telemetry showed no activity.
Shadow experimentation reflects weak or unclear organizational AI guidance.

The DORA AI Capabilities Model

Successful AI adoption depends on team and organizational capabilities, not tool selection.
A clear and communicated AI stance reduces uncertainty and speeds adoption.
A healthy internal data ecosystem prevents AI usage from being blocked by silos.
Internal policies and documentation must be accessible to both humans and AI systems.
Strong version control provides rollback safety when AI-generated code diverges.
Small batch work improves both AI output quality and system stability.
User-centered thinking ensures AI effort aligns with real human outcomes.
High-quality internal platforms allow improvements to scale across teams.

AI shifts where work breaks

AI accelerates code creation but moves constraints downstream.
Code review becomes the dominant bottleneck under AI-assisted development.
Increased code volume without improved review systems slows overall throughput.
Bottlenecks become more visible, not less, as AI usage grows.

Measuring AI ROI requires human signals

Dashboards cannot capture where work feels slow or painful.
Leaders need direct conversations with engineers about friction and workflow breakdowns.
Qualitative insight exposes failure points that metrics cannot surface.

In this episode, we cover:

(00:00) Intro

(00:55) Why the four key DORA metrics aren’t enough to measure AI impact

(03:44) The shift from four to five DORA metrics and why leaders need more than dashboards

(06:20) The one-sentence takeaway from the 2025 DORA report

(07:38) How AI amplifies both strengths and bottlenecks inside engineering systems

(08:58) What DX data reveals about how junior and senior engineers use AI differently

(10:33) The DORA AI Capabilities Model and why AI success depends on how it’s used

(18:24) How a clear and communicated AI stance improves adoption and reduces friction

(23:02) Why talking to your teams still matters

Where to find Nathen Harvey:

• LinkedIn: https://www.linkedin.com/in/nathen

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• X: https://x.com/rhein_wein

• Website: https://lauratacho.com/

• Laura’s course (Measuring Engineering Performance and AI Impact) https://lauratacho.com/developer-productivity-metrics-course

Referenced:

• DORA | State of AI-assisted Software Development 2025

• Steve Fenton - Octonaut | LinkedIn

• AI-assisted engineering: Q4 impact report

How Monzo runs data-driven AI experimentation

Abi Noda — Fri, 31 Oct 2025 16:05:48 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this episode of Engineering Enablement, I talk with Fabien Deshayes, who leads multiple platform engineering teams at Monzo Bank. Fabien shares how Monzo is adopting AI responsibly within a highly regulated industry, balancing innovation with control, structure, and strong data-driven decision-making.

We discuss how Monzo runs structured AI trials, measures adoption and satisfaction, and uses metrics to guide investment and training. Fabien explains why they moved from broad rollouts to small, focused cohorts, how they are addressing existing PR review bottlenecks that AI has intensified, and what they have learned from empowering product managers and designers to use AI tools directly.

He also shares insights into budgeting and experimentation, the results they are seeing from AI-assisted engineering, and his vision for what is next, from agent orchestration to more seamless collaboration across roles.

Some takeaways:

Evaluating AI in a regulated industry

Monzo treats AI enablement as a platform responsibility. The platform group centralizes tooling to ensure guardrails are in place for privacy, data access, and model usage across regulated domains.
Structure comes before scale. Before expanding access, Monzo limits tools to trained users and controlled contexts to reduce risk while maintaining speed of innovation.
Regulation became an advantage. The constraints of banking forced deliberate decision-making, leading to a tighter focus, better change management, and stronger adoption.

Experimentation strategy

AI adoption began with GitHub Copilot but lacked direction. Early usage was inconsistent, prompting Monzo to create structured trials with clear evaluation goals.
Each trial follows a defined process. The team selects one or two tools at a time, identifies specific use cases, and measures success against defined metrics such as adoption, satisfaction, and retention.
Trials are intentionally small and focused. Monzo learned that large-scale rollouts dilute feedback and waste time. Smaller cohorts of engineers with defined use cases yield higher-quality insights.

Evaluation criteria and metrics

Success is measured across ten to fifteen criteria. These include weekly and monthly active users, retention, percentage of AI-written code, suggestion acceptance rate, satisfaction, and cost.
Use-case analysis is part of every trial. Tools are compared across workflows like refactoring, documentation, migrations, and new feature creation to understand strengths and weaknesses.
Quantitative data is paired with survey feedback. Combining usage telemetry with engineer sentiment helps Monzo decide whether to scale a tool or end a trial.

Budgeting and cost control

Monzo sets clear spending limits before experimentation. After modeling expected usage, the team landed on a target of roughly $1,000 per engineer per year for AI tooling.
Budget discussions include leadership from the start. Transparent alignment with the CTO and finance ensures experimentation can continue without surprise costs.
Cost is balanced against value. Data from tool providers and internal telemetry validate ROI, while quarterly reviews adjust allocations based on usage trends.

Engineering outcomes

AI is increasing throughput and code volume. Pull requests per engineer are up 10–20%, with the average PR size growing by a similar amount. Around 20% of new code is now AI-generated.
Code review time is the new bottleneck. Larger, AI-assisted PRs take longer to review, creating a productivity drag that Monzo is actively addressing.
Preview environments are solving review delays. New “tenancies” allow engineers to test changes in isolated environments, improving speed and stability across teams.

Cross-functional adoption

AI use is spreading beyond engineering. Product managers and designers now prototype directly using AI tools integrated with Monzo’s design system.
Better context leads to better results. By exposing design components and brand guidelines to models, teams produce higher-quality prototypes that reduce rework later in development.

Leadership and enablement

Community drives adoption more than mandates. AI Engineering Champions across departments test tools, share learnings, and help others apply best practices in their collectives.
Upskilling is Monzo’s next investment. With only half of engineers feeling confident using AI, the focus is shifting to prompting education, internal workshops, and structured learning.
Success stories accelerate momentum. Highlighting early wins through Slack channels and company meetings keeps engineers engaged and inspired to experiment.

In this episode, we cover:

(00:00) Intro

(01:01) An overview of Monzo bank and Fabien’s role

(02:05) Monzo’s careful, structured approach to AI experimentation

(05:30) How Monzo’s AI journey began

(06:26) Why Monzo chose a structured approach to experimentation and what criteria they used

(09:21) How Monzo selected AI tools for experimentation

(11:51) Why individual tool stipends don’t work for large, regulated organizations

(15:32) How Monzo measures the impact of AI tools and uses the data

(18:10) Why Monzo limits AI tool trials to small, focused cohorts

(20:54) The phases of Monzo’s AI rollout and how learnings are shared across the organization

(22:43) What Monzo’s data reveals about AI usage and spending

(24:30) How Monzo balances AI budgeting with innovation

(26:45) Results from DX’s spending poll and general advice on AI budgeting

(28:03) What Monzo’s data shows about AI’s impact on engineering performance

(29:50) The growing bottleneck in PR reviews and how Monzo is solving it with tenancies

(33:54) How product managers and designers are using AI at Monzo

(36:36) Fabien’s advice for moving the needle with AI adoption

(38:42) The biggest changes coming next in AI engineering

Where to find Fabien Deshayes:

• LinkedIn: https://www.linkedin.com/in/fabiendeshayes

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• X: https://x.com/rhein_wein

• Website: https://lauratacho.com/

• Laura’s course (Measuring Engineering Performance and AI Impact) https://lauratacho.com/developer-productivity-metrics-course

Referenced:

Planning your 2026 AI tooling budget: guidance for engineering leaders

Abi Noda — Fri, 17 Oct 2025 16:23:54 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this webinar, Abi Noda and I explore how engineering leaders can plan their 2026 AI budgets amid rapid change and rising costs. Using data from DX’s recent poll and industry benchmarks, we break down how much to spend per developer, how to allocate budgets across tools, and how to balance innovation with cost control.

We also discuss practical strategies for building a multi-vendor approach, choosing the right metrics before and after adoption, and proving ROI through continuous measurement. Along the way, we share insights on communicating AI’s value to executives, avoiding cost-cutting narratives, and investing in enablement and training to make adoption last.

Some takeaways:

Planning Your 2026 AI Budget

Budgets are shifting from experimental to essential. In 2024–2025, most AI tool spending was discretionary and exploratory, but by 2026, it becomes a recurring line item with clear ROI expectations.
Expect rising costs. Tools are expanding their capabilities and pricing accordingly; organizations should plan for year-over-year increases.
Average spending is climbing. The floor for AI tooling is now about $500–$1,000 per developer per year, but multi-vendor setups can easily push that higher.

Allocating your AI budget

Plan for diversity across the SDLC. Modern AI adoption means using multiple tools—chat interfaces, IDE copilots, background agents, and specialized tools—to cover distinct use cases.
Shift from headcount expansion to efficiency. Many companies are reallocating budget from new hires to automation and tooling—not cutting jobs, but slowing growth to focus on productivity.
Track budget maturity. Move from “experimental” AI spending to a structured budget that includes telemetry, measurement, and enablement layers.

Vendor and tooling strategy

A multi-vendor approach is now the norm. Locking into one platform risks missing out on faster-evolving tools and model improvements.
Enterprise licensing vs. stipends. Enterprise plans streamline enablement and reporting but carry overage risks; stipends are predictable but lack team-level visibility and economies of scale.
Cost control is an active process. Use telemetry and regular reviews to ensure spending aligns with actual value delivered across teams.

Measurement and ROI

Use data before and after adoption. Measure both during proof-of-concepts and in production to understand which tools truly move the needle.
The right metrics matter. Track usage tiers (daily, weekly, monthly), time savings, developer satisfaction, and percentage of AI-generated code—while acknowledging how hard these are to measure accurately.
ROI is more than throughput. Developers spend only about 14% of their time coding; improving that slice doesn’t guarantee overall productivity gains.

Communication and leadership

Frame AI as acceleration, not automation. Messaging that focuses on cost-cutting can backfire—leaders should position AI as a way to stay competitive, not reduce headcount.
Language matters. Talk about time recaptured or capacity gained instead of cost savings to avoid triggering budget cuts.
Leadership advocacy drives adoption. Developers are far more likely to use AI tools daily when leaders promote them clearly and consistently.

Enablement and training

Training is non-negotiable. AI adoption stalls without enablement; companies investing in experiential training see stronger, sustained usage.
High-quality programs pay off. Workshops and accelerators that pair real business problems with hands-on learning cost roughly $500–$2,000 per developer—and are worth every dollar.
Measure enablement ROI. Track adoption rates before and after training to validate impact and justify continued investment.

Trends to watch

AI add-ons are expanding. Expect price hikes as existing tools add AI features—often as premium tiers or separate modules.
Custom models are not yet mainstream. Fine-tuning and bespoke models are still limited to large enterprises; most companies can focus on leveraging general models effectively.
Data-driven decisions will define leaders. Teams that collect usage data, track ROI, and adjust budgets proactively will outpace those relying on assumptions.

In this episode, we cover:

(00:00) Intro: Setting the stage for AI budgeting in 2026

(01:45) Results from DX’s AI spending poll and early trends

(03:30) How companies are currently spending and what to watch in 2026

(04:52) Why clear definitions for AI tools matter and how Laura and Abi think about them

(07:12) The entry point for 2026 AI tooling budgets and emerging spending patterns

(10:14) Why 2026 is the year to prove ROI on AI investments

(11:10) How organizations should approach AI budgeting and allocation

(15:08) Best practices for managing AI vendors and enterprise licensing

(17:02) How to define and choose metrics before and after adopting AI tools

(19:30) How to identify bottlenecks and AI use cases with the highest ROI

(21:58) Key considerations for AI budgeting

(25:10) Why AI investments are about competitiveness, not cost-cutting

(27:19) How to use the right language to build trust and executive buy-in

(28:18) Why training and enablement are essential parts of AI investment

(31:40) How AI add-ons may increase your tool costs

(32:47) Why custom and fine-tuned models aren’t relevant for most companies today

(34:00) The tradeoffs between stipend models and enterprise AI licenses

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

• Substack: https://substack.com/@abinoda

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• X: https://x.com/rhein_wein

• Website: https://lauratacho.com/

• Laura’s course (Measuring Engineering Performance and AI Impact): https://lauratacho.com/developer-productivity-metrics-course

Referenced:

The evolving role of DevProd teams in the AI era

Abi Noda — Fri, 26 Sep 2025 15:36:54 GMT

Listen and watch now on YouTube, Apple, and Spotify.

Today, I am joined once again by our CTO, Laura Tacho, to explore how the role of Platform and DevProd teams is evolving in the AI era. We discuss why evaluating and rolling out AI tools is now a core responsibility, how measurement frameworks must change, and why fundamentals like documentation and feedback loops matter more than ever for both developers and AI agents. We also dive into strategies for handling tool sprawl, hardening pipelines for increased throughput, and applying AI at scale to tech debt, migrations, and workflows across the SDLC.

Some takeaways:

Platform teams in transition

Platform teams are redefining their mandate in the AI era. They’re moving beyond CI/CD into AI evaluation, rollout, and measurement, taking on more strategic influence across the SDLC.
Expectations on leaders are rising. Platform teams are now expected to guide both productivity and AI adoption, balancing technical and cultural change.

Defining the platform role

The platform role is hard to define and AI raises the stakes. Teams must bridge old responsibilities with new AI-driven expectations.
Clarity creates leverage. Clear mandates help leaders align stakeholders and secure resources.

Evaluation and measurement

Evaluating and rolling out AI tools is now a Platform role. Platform leaders handle vendor selection, pilots, and enterprise rollouts while tracking impact.
Strong measurement frameworks are essential. Teams need data-driven ways to evaluate AI’s impact beyond lines of code, blending old and new metrics for a realistic picture.
Platform leaders are internal educators. Explaining metrics to executives avoids hype-driven decisions and builds trust in AI-assisted development.

Hardening systems and guardrails

AI-generated code stresses pipelines and build systems. Platform teams must harden CI/CD, testing, and local environments to handle larger batch sizes and faster iterations.
Guardrails complement throughput. Quality checks, security reviews, and feedback loops maintain reliability while speed accelerates, laying groundwork for large-scale refactors and technical debt initiatives.

Standardization and knowledge sharing

Training alone isn’t enough — standardization creates leverage. Curated workflows and reusable templates help maintain consistent AI adoption.
Standardizing tools and knowledge multiplies impact. Shared workflows and internal documentation reduce duplication and accelerate learning.

Applying AI at scale

Platform teams can apply AI directly for high-leverage work. Code migrations, technical debt reduction, and repetitive tasks become one-to-many accelerators rather than personal productivity boosts.
Context as a service matters. Centralizing setup and best practices allows product teams to focus on business problems while platform teams handle complexity and modernization.
AI tackles technical debt at scale. Platform teams can orchestrate migrations, refactors, and other neglected work for organization-wide benefit.

Focusing on fundamentals and the big picture

Fundamentals benefit both developers and AI agents. Documentation, feedback loops, and clean codebases improve outcomes for LLMs just as they do for human developers.
AI doesn’t erase old bottlenecks. Meetings, interruptions, and process inefficiencies still outweigh many AI productivity gains, making platform leaders the stewards of the bigger picture.

Looking beyond code authoring

Opportunities go far beyond code authoring. Requirements, testing, validation, and even documentation are ripe areas for AI leverage.
End-to-end adoption multiplies gains. Companies see the biggest improvements when AI is applied across the SDLC, creating a stronger developer experience overall.

In this episode, we cover:

(00:00) Intro: Why platform teams need to evolve

(02:34) The challenge of defining platform teams and how AI is changing expectations

(04:44) Why evaluating and rolling out AI tools is becoming a core platform responsibility

(07:14) Why platform teams need solid measurement frameworks to evaluate AI tools

(08:56) Why platform leaders should champion education and advocacy on measurement

(11:20) How AI code stresses pipelines and why platform teams must harden systems

(12:24) Why platform teams must go beyond training to standardize tools and create workflows

(14:31) How platform teams control tool sprawl

(16:22) Why platform teams need strong guardrails and safety checks

(18:41) The importance of standardizing tools and knowledge

(19:44) The opportunity for platform teams to apply AI at scale across the organization

(23:40) Quick recap of the key points so far

(24:33) How AI helps modernize legacy code and handle migrations

(25:45) Why focusing on fundamentals benefits both developers and AI agents

(27:42) Identifying SDLC bottlenecks beyond AI code generation

(30:08) Techniques for optimizing legacy code bases

(32:47) How AI helps tackle tech debt and large-scale code migrations

(35:40) Tools across the SDLC

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

• Substack: https://substack.com/@abinoda

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• X: https://x.com/rhein_wein

• Website: https://lauratacho.com/

• Laura’s course (Measuring Engineering Performance and AI Impact): https://lauratacho.com/developer-productivity-metrics-course

Referenced:

Lessons from Twilio’s multi-year platform consolidation

Abi Noda — Fri, 12 Sep 2025 14:00:12 GMT

Listen and watch now on YouTube, Apple, and Spotify.

Jesse Adametz is a Senior Engineering Leader on the Developer Platform at Twilio, where he’s leading a multi-year effort to unify tech stacks across major acquisitions. I’m joined by him today to talk about the realities of platform adoption and migration, the challenges of Kubernetes, and why mandates don’t work when you’re bringing together strong engineering cultures. We also dive into why Jesse treats developer experience as a product, what “change as a service” looks like in practice, and how Twilio is approaching AI with pragmatism, experimentation, and a focus on real value.

Some takeaways:

Platform adoption and migrations

Consolidation is multi-year and non-linear. Twilio’s integration of Segment, SendGrid, and core stacks showed that platform consolidation is never a straight path. Early decisions, like launching EKS in Europe first for data residency before migrating U.S. workloads, highlight the inevitable “zigzags.”
Migrations are about people as much as tech. Developers won’t abandon working systems on command. Advocacy, training, and executive alignment are critical — Jesse points to a past effort to put all engineers on call that only worked with CEO buy-in and steady rollout.
Balance greenfield and brownfield. Twilio puts new services onto golden paths such as standardized Java pipelines, while gradually remediating older, bespoke setups. Greenfield adoption refines the standards that brownfield teams follow once value is proven.
Carrots work better than mandates. Scorecards and contextual quality checks nudge teams toward best practices, while hard stops are reserved for critical reliability or security gaps.

Developer experience as a product

Treat platforms like products. Twilio surveys developers, tracks NPS, and requires alpha customers before building new tools. If no teams are ready to test, platform focuses on paying down tech debt instead.
Centralize the cost of change. Rather than hundreds of teams repeating the same migrations, platform absorbs the toil, standardizing and automating on-ramps so engineers get smoother adoption paths.
Change as a service reduces pain. The long-term goal is pre-tested PRs with confidence scores, making migrations as simple as review and merge. This requires standardized tests, frameworks, and rollback strategies to scale safely.
Show value early and often. Developers move when they see clear improvements. Dashboards highlight service quality gains and consolidated metrics roll into leadership reviews to reinforce progress across levels.

Kubernetes and modernization

Kubernetes isn’t always the answer. While it solved many problems, Twilio’s telephony and SIP workloads didn’t fit neatly. Forcing Kubernetes everywhere would have created more pain than value.
Modernization is the bigger picture. Twilio emphasizes improvements like autoscaling groups, observability, and simplifying legacy systems rather than betting on a single orchestrator.
Unify the developer experience. A consistent experience across stacks matters more than enforcing a single tool. Developers should interact with one UX regardless of what’s running underneath.
Reveal complexity only when needed. Golden paths hide infrastructure details for most engineers, with advanced controls surfaced only for high-scale, internet-facing workloads.

AI adoption

AI hype feels different from Kubernetes. Kubernetes shifted work sideways, but AI promises to remove repetitive tasks and elevate workflows. Adoption requires structured experiments, benchmarks, and cultural adaptation.
AI is both art and science. Clear PRDs and strong prompting skills dramatically improve AI outputs, and platform teams should train developers in these practices.
AI changes workflows, not just tools. Engineers spend less time writing code and more time reviewing it, requiring higher throughput and different habits.
Experimentation and standardization must coexist. Twilio’s review boards enforce safety and vendor discipline while greenfield teams run short feedback loops to test new tools.
Guilds expose platform gaps. The AI DevEx guild drew 400 members instantly, showing demand for shared learning. Over time, platform should absorb this role instead of leaving it to grassroots groups.
AI can augment incidents, not replace engineers. Vendors promising self-healing agents are overhyped, but AI already helps with anomaly detection and other narrow, reliable applications.

Tool evaluation and reliability

Pricing should encourage healthy behavior. Per-run or per-policy models discourage usage and visibility. Twilio favors per-seat or platform-level pricing that scales with engineering needs.
Reliability is non-negotiable. As an infrastructure provider, Twilio demands “dial-tone” reliability from its vendors, weighing monitoring, availability, and scale just as heavily as price.

Support as a platform capability

Internal developers deserve customer-grade support. Today’s fragmented Slack channels create noise; a single entry point ensures clarity and visibility.
Support requests reveal automation opportunities. Analyzing tickets showed that automating just two processes could reduce requests by double digits.
A single “oh button” sets the standard. One clear way to ask for help, backed by automation and human fallback, gives developers the same reliability expected in customer support.

In this episode, we cover:

(00:00) Intro

(01:30) Jesse’s background and how he ended up at Twilio

(04:00) What SRE teaches leaders and ICs

(06:06) Where Twilio started the post-acquisition integration

(08:22) Why platform migrations can’t follow a straight-line plan

(10:05) How Twilio balances multiple strategies for migrations

(12:30) The human side of change: advocacy, training, and alignment

(17:46) Treating developer experience as a first-class product

(21:40) What “change as a service” looks like in practice

(24:57) A mandateless approach: creating voluntary adoption through value

(28:50) How Twilio demonstrates value with metrics and reviews

(30:41) Why Kubernetes wasn’t the right fit for all Twilio workloads

(36:12) How Twilio decides when to expose complexity

(38:23) Lessons from Kubernetes hype and how AI demands more experimentation

(44:48) Where AI fits into Twilio’s platform strategy

(49:45) How guilds fill needs the platform team hasn’t yet met

(51:17) The future of platform in centralizing knowledge and standards

(54:32) How Twilio evaluates tools for fit, pricing, and reliability

(57:53) Where Twilio applies AI in reliability, and where Jesse is skeptical

(59:26) Laura’s vibe-coded side project built on Twilio

(1:01:11) How external lessons shape Twilio’s approach to platform support and docs

Where to find Jesse Adametz:

• LinkedIn: https://www.linkedin.com/in/jesseadametz/

• X: https://x.com/jesseadametz

• Website: https://www.jesseadametz.com/

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• X: https://x.com/rhein_wein

• Website: https://lauratacho.com/

• Laura’s course (Measuring Engineering Performance and AI Impact) https://lauratacho.com/developer-productivity-metrics-course

Referenced:

Driving enterprise-wide AI tool adoption

Abi Noda — Fri, 05 Sep 2025 15:40:37 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this episode, I talk with Bruno Passos, Product Lead for Developer Experience at Booking.com, about how the company is rolling out AI tools across a 3,000-person engineering team.

Bruno shares how Booking.com set ambitious innovation goals, why cultural change mattered as much as technology, and the education practices that turned hesitant developers into daily users. We also discuss the early barriers, from low adoption and knowledge gaps to procurement hurdles, and the interventions that worked, including learning paths, hackathon-style workshops, Slack communities, and centralized procurement. Today, Booking.com is already in the top 25 percent of companies for AI adoption.

Some takeaways:

Booking.com’s goals for AI adoption

The goal was to accelerate development and remove toil. Engineers could spend more time on innovation instead of maintenance.
Booking.com set an ambitious “innovation ratio.” They aimed for 80–90% of engineer time on creative work, compared to a global top 10% benchmark of 65%.
AI was also a lever for modernization. Even with a legacy codebase, leaders believed it could help re-platform, clean up technical debt, and shorten cycle times.
The mindset was experimental from the start. AI wasn’t introduced because it was trendy but as a series of controlled experiments to uncover real business value.

Building adoption through enablement

Adoption grew only after targeted enablement. Curated learning paths and structured training helped developers build confidence.
Two-day learning sprints became a turning point. Day one focused on prompt engineering and context handling, while day two paired developers with real business problems.
Developers brought their own problems to hackathons. This made training immediately relevant and produced artifacts that solved real issues.
Experience-based accelerators (EBAs) scaled the model. In these 3–5 day focused sessions, developers worked cross-functionally with providers, and about 70% of the code produced was AI-assisted.
Providers were treated as partners. Booking.com invited tool vendors to embed with teams, share expertise, and help tackle specific business challenges.

Driving cultural change

AI adoption required cultural as well as technical change. Simply granting licenses didn’t work; Booking.com needed to integrate AI into its development culture.
Leadership enablement was crucial. Leaders who experienced AI firsthand became stronger advocates for adoption across their business units.
Coding sessions with leaders built momentum. These sessions energized managers, who returned to their teams pushing for wider use.
Slack communities lowered friction. Developers felt more comfortable asking questions in Slack than in formal settings, which revealed education gaps and sped up problem-solving.
Cross-unit collaboration improved. AI became a catalyst for teams across different business units to work together more closely.

Measuring impact while key questions remain

Early “hours saved” metrics weren’t enough. Surveys about time savings were too subjective and based on too few users.
Booking.com shifted to broader measures. They focused on speed, efficiency, quality, cycle time, and re-platforming progress.
AI users created more—but lighter—merge requests. Daily users produced 30% more MRs that were 70% smaller, but it is not yet clear whether this improves quality or efficiency.
Human review remains essential. Every MR still requires two reviewers, which protects against poor code slipping through.
Key questions remain open. Is AI-assisted code more readable, efficient, and secure? Does it reduce bugs and vulnerabilities? These answers are still undetermined.

Unblocking organizational bottlenecks

Procurement and risk reviews slowed down experimentation. Developers wanted to try every new AI tool, but legal and security created friction.
A central “committee” was formed to unblock adoption. This group reviewed requests weekly, streamlined approvals, and prevented each business unit from duplicating effort.
Fast-tracking proofs of concept became a priority. Developers could get hands-on with new tools sooner, while procurement caught up in the background.
The platform mindset helped. Booking.com treated AI like any other developer tool, centralizing pain points and ensuring consistency across the company.

Tracking current adoption trends

60% of developers now use AI tools. This places Booking.com in the top 25% of companies globally for adoption.
Daily users ship more frequently. Developers using AI at least three times a week create 30% more merge requests.
Usage patterns are evolving. While more MRs are being created, many are lighter, and Booking.com is still investigating what that means for long-term quality.
Adoption is uneven across units. Some business units are ahead, while others are still cautious, reflecting differences in culture, legacy systems, and leadership buy-in.

In this episode, we cover:

(00:00) Intro

(01:09) Bruno’s role at Booking.com and an overview of the business

(02:19) Booking.com’s goals when introducing AI tooling

(03:26) Why Booking.com made such an ambitious innovation ratio goal

(06:46) The beginning of Booking.com’s journey with AI

(08:54) Why the initial adoption of Cody was low

(13:17) How education and enablement fueled adoption

(15:48) The importance of a top-down cultural change for AI adoption

(17:38) The ongoing journey of determining the right metrics

(21:44) Measuring the longer-term impact of AI

(27:04) How Booking.com solved internal bottlenecks to testing new tools

(32:10) Booking.com’s framework for evaluating new tools

(35:50) The state of adoption at Booking.com and efforts to expand AI use

(37:07) What’s still undetermined about AI’s impact on PR/MR quality

(39:48) How Booking.com is addressing lagging adoption and monitoring churn

(43:24) How Booking.com’s Slack community lowers friction for questions and support

(44:35) Closing thoughts on what’s next for Booking.com’s AI plan

Where to find Bruno Passos:

• LinkedIn: https://www.linkedin.com/in/brpassos/

• X: https://x.com/brunopassos

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• X: https://x.com/rhein_wein

• Website: https://lauratacho.com/

• Laura’s course (Measuring Engineering Performance and AI Impact) https://lauratacho.com/developer-productivity-metrics-course

Referenced:

Measuring AI code assistants and agents with the AI Measurement Framework

Abi Noda — Fri, 15 Aug 2025 15:23:17 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this episode of Engineering Enablement, DX CEO Abi Noda and I share practical guidance for measuring developer productivity in the AI era using our AI Measurement Framework. Based on research with industry leaders, vendors, and hundreds of organizations, we walk through how to cut through the hype and make informed decisions about AI adoption.

We talk about which fundamentals of productivity measurement remain unchanged, why metrics like acceptance rate can be misleading, and how to track AI’s real impact across utilization, quality, and cost. We also cover how to measure agentic workflows, expand the definition of “developer” to include AI-enabled contributors, and identify second-order effects before they create long-term problems.

If you’re introducing AI coding tools, exploring autonomous agents, or just trying to separate signal from noise, this episode offers a clear, actionable roadmap for using data to ensure AI delivers sustainable, meaningful gains.

Some takeaways:

AI’s hype vs. reality gap

Bold headlines are often misleading. Claims like “90% of code will be written by AI” typically come from cherry-picked studies in narrow scenarios, not representative of the median developer experience.
Organizations need their own data. Vendor marketing and public research can set unrealistic expectations—measuring AI’s real-world impact in your own environment is the only way to guide strategy and investment.

AI doesn’t change engineering fundamentals

Core principles remain the same. Scalability, maintainability, reliability, and meeting customer needs still define good engineering.
AI builds on—not replaces—these foundations. Use AI to lift existing strengths, not as an excuse to rebuild productivity measurement from scratch.

The AI Measurement Framework

Three dimensions matter most: utilization (how widely AI is used), impact (how it changes performance), and cost (what you spend on tools, licenses, training).
Track them together for the full picture. Over-indexing on one—like utilization—can lead to false conclusions about overall value.

The pitfalls of acceptance rate

Acceptance rate is unreliable. AI code that’s accepted in the IDE is often rewritten, heavily modified, or deleted before shipping.
Better options exist. Tagging PRs for AI contributions or using file-level observability can identify AI-authored changes across all IDEs and tools, avoiding blind spots.

Collecting measurement data

Tool telemetry (from GitHub, GitLab, or AI vendors) shows patterns in daily and weekly adoption.
Quarterly surveys reveal long-term trends in developer satisfaction, productivity, and maintainability perceptions.
In-workflow experience sampling asks targeted questions at the moment of work—e.g., “Was this PR authored with AI?”—to get precise, low-bias feedback.

Perception vs. reality in time savings

Developers often feel faster with AI—but logs may say otherwise. A meta-study found that self-reports overstated gains; in some cases, AI users were slower.
Triangulate survey and system data to confirm that perceived improvements match actual throughput and quality metrics.

Measuring agentic workflows

Treat agents as team extensions, not digital employees. Measure productivity for the combined human-agent team, just as you would for a team using CI/CD tools like Jenkins.
Focus on maturity, not just usage. There’s a big difference between using AI for autocomplete and delegating multi-step tasks to autonomous loops.

Expanding the definition of developer

AI enables more contributors. Designers, PMs, and other non-engineers can now produce functional code and prototypes.
Apply the same quality gates—code review, testing, maintainability checks—to their contributions as to full-time engineers.

Thinking beyond AI

AI is one tool in the toolbox. Many bottlenecks—like unclear requirements, inefficient processes, and infrastructure delays—can’t be solved by AI alone.
Balance investment to ensure you’re addressing all productivity levers, not just AI adoption.

Watching for second-order effects

More AI-generated code can create new bottlenecks. Extra output can slow PR reviews, increase cognitive load, and lower maintainability.
Impact metrics reveal trade-offs early, helping you prevent short-term speed gains from causing long-term technical debt.

Rolling out metrics successfully

Aggregate at team or department level. Avoid individual tracking to build trust and reduce fear around AI adoption.
Be transparent about data use so developers know it’s for enablement, tool evaluation, and rollout strategy—not performance surveillance.

In this episode, we cover:

(00:00) Intro

(01:26) The challenge of measuring developer productivity in the AI age

(04:17) Measuring productivity in the AI era — what stays the same and what changes

(07:25) How to use DX’s AI Measurement Framework

(13:10) Measuring AI’s true impact from adoption rates to long-term quality and maintainability

(16:31) Why acceptance rate is flawed — and DX’s approach to tracking AI-authored code

(18:25) Three ways to gather measurement data

(21:55) How Google measures time savings and why self-reported data is misleading

(24:25) How to measure agentic workflows and a case for expanding the definition of developer

(28:50) A case for not overemphasizing AI’s role

(30:31) Measuring second-order effects

(32:26) Audience Q&A: applying metrics in practice

(36:45) Wrap up: best practices for rollout and communication

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• Website: https://lauratacho.com/

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

• Substack: https://substack.com/@abinoda

Referenced:

How to cut through the hype and measure AI’s real impact (Live from LeadDev London)

Abi Noda — Fri, 08 Aug 2025 15:07:17 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this special episode of the Engineering Enablement podcast, recorded live at LeadDev London, I unpack the gap between AI hype and engineering reality—and how leaders can use data to close it.

I share the latest insights from nearly 39,000 developers across 184 companies, walk through the Core 4 and AI Measurement Frameworks, and explain how to use them together to measure what matters, improve developer experience, and drive real organizational impact—without getting lost in the noise.

Some takeaways:

The AI hype cycle vs. ground truth

The “disappointment gap” refers to the widening space between sensational AI headlines and the lived reality of teams on the ground. Organizations are being pushed to move faster with AI, yet few have defined what success even looks like.
Headlines touting “90% of code written by AI” inflate expectations and erode trust. Developers feel let down when tools don’t deliver on the hype. Executives, in turn, expect exponential productivity gains without understanding what’s realistically achievable.
The best way to close this gap is with data. Leaders need to ground their AI strategies in facts, not forecasts.

AI’s current role in high-performing engineering orgs

In the top quartile of organizations, around 60% of developers are now using AI tools daily or weekly. However, this usage does not translate directly into AI generating most of the code.
These organizations are seeing the best results because they invest in enablement, support, and identifying practical use cases that actually work.
Across nearly 39,000 developers at 184 companies, the average reported time savings from AI use is 3 hours and 45 minutes per week. It’s a meaningful uplift, but not a silver bullet.

Engineering leaders must shape the narrative

Engineering leaders are also business leaders—and they need to take on the responsibility of educating peers and execs on what AI adoption actually looks like.
Effective leaders can clearly answer:
1. How is our organization performing today?
2. How is AI helping—or not helping?
3. What are we doing next to improve?

Back to basics: what defines engineering excellence?

A shared definition of engineering performance is essential before measuring the effects of AI. The DX Core 4 framework offers this foundation.
Core 4 combines elements of DORA, SPACE, and DevEx into a single, balanced model with four key dimensions: speed, effectiveness, quality, and impact.
These metrics must be evaluated together. Optimizing one at the expense of another (e.g., speed at the cost of quality) risks destabilizing the system.

Developer experience drives performance outcomes

Developer experience is the strongest performance lever available to engineering organizations. The DXI (Developer Experience Index) measures 14 evidence-based drivers of experience and correlates directly with time savings.
For each DXI point gained, developers save 13 minutes per week. While that may seem small, the impact scales dramatically across teams.
Block used DXI to identify 500,000 hours lost annually due to friction—data that directly shaped their investment decisions and enabled faster delivery without compromising quality.

A complementary framework for measuring AI

The AI Measurement Framework adds clarity by tracking the effect of AI across three pillars: utilization, impact, and cost.
Utilization captures how broadly and consistently AI tools are being used. The biggest gains typically come when teams move from occasional to consistent usage.
Time savings per week is the most aligned metric across the industry for measuring impact.
Cost includes not just licenses but also investment in training and enablement—areas that are often overlooked but essential for success.

Using both frameworks together creates clarity and confidence

Core 4 answers: “What does high performance look like?”
The AI Measurement Framework answers: “How is AI affecting that performance?”
Together, these frameworks enable leaders to move beyond guesswork and act with clarity, especially during times of rapid change.

AI is a multiplier—but only with the right foundations

Accelerating software delivery with AI is possible, but it requires strong fundamentals in place. Cutting corners on quality, stability, or developer experience for short-term gains can create long-term damage.
When grounded in solid frameworks and real data, AI can improve velocity, collaboration, and developer satisfaction without compromising core engineering values.

Better software faster is possible—not by chasing hype, but by aligning teams on what matters and measuring what works.

In this episode, we cover:

(00:00) Intro: Laura’s keynote from LDX3

(01:44) The problem with asking how much faster can we go with AI?

(03:02) How the disappointment gap creates barriers to AI adoption

(06:20) What AI adoption looks like at top-performing organizations

(07:53) What leaders must do to turn AI into meaningful impact

(10:50) Why building better software with AI still depends on fundamentals

(12:03) An overview of the DX Core 4 Framework

(13:22) Why developer experience is the biggest performance lever

(15:12) How Block used Core 4 and DXI to identify 500,000 hours in time savings

(16:08) How to get started with Core 4

(17:32) Measuring AI with the AI Measurement Framework

(21:45) Final takeaways and how to get started with confidence

Where to find Laura Tacho:

• X: https://x.com/rhein_wein

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• Website: https://lauratacho.com/

Referenced:

Unpacking METR’s findings: Does AI slow developers down?

Abi Noda — Fri, 01 Aug 2025 15:02:53 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this episode, I’m joined by Quentin Anthony, Head of Model Training at Zyphra and a participant in METR’s recent study on AI coding tools. We explore the study’s unexpected findings—why developers often felt more productive using AI, but in many cases weren’t—and unpack the nuances of where these tools actually add value. Quentin offers practical, experience-backed advice on avoiding common pitfalls, such as the sunk-cost fallacy and context rot, evaluating task-level fit, and building the kind of tool hygiene that’s critical for long-term success with AI.

Some takeaways:

The biggest takeaways from the METR study

Quentin participated in a recent study by METR (Model Evaluation and Training Red Teamers), which found that on average, developers were slower when using AI—despite feeling more productive.
The gap between perceived and actual efficiency is real—and often overlooked, especially when AI feels fun to use.

AI excels at documentation, unit tests, and refactoring—tasks that can often be completed in a single prompt.
For complex, low-level work—like GPU kernels or distributed systems—models tend to produce bloated code or require too much back-and-forth to be useful.

Time-boxing is key to avoiding sunk-cost fallacy and watching out for context rot

Quentin recommends setting strict time limits when using AI: if it’s not helping in 10–15 minutes, move on.
Developers often spend too long trying to force a model to help with the wrong kind of task.

Long chats and overloaded prompts can confuse models, causing hallucinations and inconsistent behavior.
Restarting chats frequently and summarizing past work helps keep models grounded and accurate.
Quentin recommends using summarization prompts to distill the current chat before restarting—this keeps context clean while avoiding repetition.

AI tools introduce more idle time than you think

Waiting on model responses—even just 15–30 seconds—adds up fast, especially with reasoning-heavy prompts.
Quentin uses this downtime for microtasks and blocks distractions, such as social media, to stay in the flow.

Prompting skill helps—but isn’t everything

Success with AI isn’t just about writing better prompts. Often, task-model mismatch is the real problem.
Blaming developers for tool failures ignores deeper limitations in model training and context handling.
Quentin treats AI output like junior engineer code—carefully reviewed, never blindly trusted. Even when correct, AI-generated code is often bloated or hard to maintain.

Focus on task-level fit, not team-level rollout

Organizations should evaluate AI usefulness at the task level, not by team, codebase, or tooling preference.
Not all work benefits from AI—even within the same repo or project.
In Quentin’s view, tasks like acceptance tests, PR review, or boilerplate generation are ideal candidates for model support—while planning and design should stay human-led.

Model behavior varies—test before trusting

Different models excel at different things. Claude may outperform Gemini at writing comments, while Gemini might be better at summarizing code.
Quentin tries new models on familiar tasks first and expands use only after carefully watching for failure patterns.
Claude is strong at writing clean, human-readable code.
Gemini 2.5 is particularly good at summarizing. Quentin picks models based on task-specific strengths rather than defaulting to one tool.

Tool sprawl creates friction and other limits

Constantly switching between AI tools leads to confusion and inconsistent results.
Quentin keeps his toolset small and stable, adjusting slowly to new platforms to avoid cognitive overload.
When adopting new tools, Quentin starts with familiar, low-risk tasks like unit tests, then gradually expands usage as he learns how the model behaves.

Multi-agent systems are exciting, but they are still unreliable. They perform well in narrow settings, but struggle in real-world workflows.
For now, you’ll get more value from well-scoped tools and clearly defined use cases

In this episode, we cover:

(00:00) Intro

(01:32) A brief overview of Quentin’s background and current work

(02:05) An explanation of METR and the study Quentin participated in

(11:02) Surprising results of the METR study

(12:47) Quentin’s takeaways from the study’s results

(16:30) How developers can avoid bloated code bases through self-reflection

(19:31) Signs that you’re not making progress with a model

(21:25) What is “context rot”?

(23:04) Advice for combating context rot

(25:34) How to make the most of your idle time as a developer

(28:13) Developer hygiene: the case for selectively using AI tools

(33:28) How to interact effectively with new models

(35:28) Why organizations should focus on tasks that AI handles well

(38:01) Where AI fits in the software development lifecycle

(39:40) How to approach testing with models

(40:31) What makes models different

(42:05) Quentin’s thoughts on agents

Where to find Quentin Anthony:

• LinkedIn: https://www.linkedin.com/in/quentin-anthony/

• X: https://x.com/QuentinAnthon15

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

Referenced:

CarGurus’ journey building a developer portal and increasing AI adoption

Abi Noda — Fri, 11 Jul 2025 15:20:56 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In today’s episode, I caught up with Frank Fodera from CarGurus to hear how his team tackled two massive challenges: breaking up a monolith and rolling out AI tools across the engineering org. Frank shared how they built their internal developer portal, Showroom, from scratch—why they didn’t buy a tool like Backstage, how it supports their dev teams, and how they tie it all back to business impact. We also discussed their AI experiments, the metrics they’re tracking, and how they’re driving adoption without imposing one-size-fits-all solutions.

Some takeaways:

Internal developer portals: from spreadsheet to strategic platform

Started with a spreadsheet. CarGurus built their internal developer portal, Showroom, in 2019. The initial goal was simple: solve ownership confusion during a monolith-to-microservices transition.
Showroom now powers five key pillars:
- Discoverability: Centralized source of truth for services, jobs, and ownership
- Governance: Dynamic compliance checks to ensure “Golden Path” adoption
- Self-serviceability: Spin up new services, subscribe to topics, and more
- Transparency: Access logs, service data, and operational info in one place
- Operational efficiency: Reduced cognitive load and friction across the SDLC
Interfaces with infra but abstracts complexity. Showroom sits on top of observability, alerting, and infra tools, providing a consistent experience even as backend systems evolve.
Evolved by solving real problems, not chasing trends. Rather than set out to “build an IDP,” the team invested in Showroom whenever it accelerated strategic initiatives—and saw massive returns.
Impact: 75-day setup cycles cut to <3 days. By leaning on automated workflows, best-practice templates, and centralized access to information, CarGurus dramatically increased service creation velocity.

Build vs. buy: why CarGurus chose to go homegrown

Started before Backstage existed. No off-the-shelf solution could meet their requirements when the initiative began.
Kept evaluating even after Backstage launched. But concluded the effort to customize it would be equal to enhancing their own tool.
Customization was critical. Their IDP integrates deeply with internal systems and niche tools not typically supported by commercial solutions.
Consistent UX across tools. Showroom provides a single pane of glass—developers don’t need to know what’s happening under the hood as tools change.

AI coding assistants: a platform-level initiative

Bake-off between three tools. CarGurus ran a head-to-head comparison across multiple AI assistants. Feedback showed different tools excelled in different languages and workflows.
Standardization isn’t the goal. The company is embracing flexibility—letting teams and engineers choose the best tool for their domain.
Qualitative feedback > raw telemetry. Developer sentiment and perceived time savings are more actionable than latency or code volume alone.
Measuring impact across six dimensions:
- Speed: Diffs per engineer
- Efficiency: Developer Experience Index (DXI)
- Satisfaction: CSAT scores
- Adoption: Tool usage tracking
- Time savings: Self-reported
- Burnout prevention: Monitoring job satisfaction
Maintainability dropped slightly. AI-generated code may feel less “owned,” even if it’s correct. A known and acceptable trade-off so far.

Driving adoption: what’s working

Leadership buy-in was immediate. Execs were eager to invest in efficiency. AI adoption was viewed as a strategic imperative.
Champions are key. Internal AI “power users” share tips via tech talks and videos to scale best practices organically.
Internal marketing makes a difference. Just telling developers they had access wasn’t enough. Framing AI tooling as a must-try (not optional) led to a meaningful adoption spike.
Focus is on education and experimentation. Goal is to help developers integrate AI into their daily flow—not just try it once and move on.
Expected ROI? ~15–30% gains in efficiency in early stages, with room to grow as usage matures and more advanced agent-based tools are explored.

In this episode, we cover:

(00:00) Intro: IDPs (Internal Developer Portals) and AI

(02:07) The IDP journey at CarGurus

(05:53) A breakdown of the people responsible for building the IDP

(07:05) The five pillars of the Showroom IDP

(09:12) How DevX worked with infrastructure

(11:13) The business impact of Showroom

(13:57) The transition from monolith to microservices and struggles along the way

(15:54) The benefits of building a custom IDP

(19:10) How CarGurus drives AI coding tool adoption

(28:48) Getting started with an AI initiative

(31:50) Metrics to track

(34:06) Tips for driving AI adoption

Where to find Frank Fodera :

• LinkedIn: https://www.linkedin.com/in/frankfodera/

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

Referenced:

Snowflake’s playbook for operational excellence

Abi Noda — Fri, 20 Jun 2025 13:10:01 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this episode, I talk with Gilad Turbahn, Head of Developer Productivity, and Amy Yuan, Director of Engineering at Snowflake, about how they approach operational excellence. We dig into how they build trust with engineering teams, the communication rhythms that keep their org aligned, and how they treat internal teams like customers. Gilad and Amy also share how Snowflake uses roadmaps, advisory boards, and direct feedback to drive priorities—and how they’re evolving their planning practices to scale with the company.

Some takeaways:

What operational excellence looks like at Snowflake

Do what you say you’ll do—on time—to build trust.
Treat internal teams like customers.
Use documentation and help channels to support engineering teams at scale.

Creating a strong feedback loop

Feedback is only valuable if it’s followed up on—Snowflake closes the loop with a tracking system (tracking each person spoken to) and a weekly ops review.
Show how the feedback is reflected in the road map, or explain why it isn’t.

Snowflake’s five-part communication rhythm

Weekly newsletters
Surveys to gather and prioritize needs
Interviews across levels
Roadshows and all-hands
A mix of top-down and bottom-up adoption strategies

Barriers to operational excellence

Lagging adopters and resistance to change
Wrong leadership style for the team or individual—leadership should be tailored to the individual and team needs.

Customer engagement, redefined

Treat engineers like customers—tailor time and communication to their needs.
Attend team all-hands, host listening sessions, and use data to show trade-offs.
Snowflake adjusts session length from quick 5-minute syncs to deep 60-minute sessions depending on the needs.

Converting detractors into allies

Snowflake PMs are highly technical and dive deep into engineers’ pain points.
They walk through problems collaboratively and validate concerns.
Customer Advisory Boards help create evangelists from within.

Customer Advisory Boards at Snowflake

Boards are built from diverse teams and levels—both promoters and detractors.
They look for people who are vocal.
Members are identified either in interviews or nominated by directors.
Meetings are informal with no pressure to make every single meeting.
They use Slack to keep the feedback loop active.
Moderate effectively: At Snowflake the process is to introduce a road map, design, quarterly plan, or survey results. Then ask members what is top of mind.

Planning and goal-setting at Snowflake

Planning at Snowflake is done both annually and quarterly.
They balance OKRs with a long-term North Star.
They are currently experimenting with “thematic prioritization.” Thematic prioritization uses main themes that can make a difference to customers.

In this episode, we cover:

(00:00) Intro: an overview of operational excellence

(04:13) Obstacles to executing with operational excellence

(05:51) An overview of the Snowflake playbook for operational excellence

(08:25) Who does the work of reaching out to customers

(09:06) The importance of customer engagement

(10:19) How Snowflake does customer engagement

(14:13) The types of feedback received and the two camps (supporters and detractors)

(16:55) How to influence detractors and how detractors actually help

(18:27) Using insiders as messengers

(22:48) An overview of Snowflake’s customer advisory board

(26:10) The importance of meeting in person (learnings from Warsaw and Berlin office visits)

(28:08) Managing up

(30:07) How planning is done at Snowflake

(36:25) Setting targets for OKRs, and Snowflake’s philosophy on metrics

(39:22) The annual plan and how it’s shared

Where to find Amy Yuan:

• LinkedIn: https://www.linkedin.com/in/amy-yuan-a8ba783/

Where to find Gilad Turbahn:

• LinkedIn: https://www.linkedin.com/in/giladturbahn/

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

Referenced:

The biggest obstacles preventing GenAI adoption — and how to overcome them

Abi Noda — Fri, 06 Jun 2025 14:09:29 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In this episode, I’m joined by DX CTO Laura Tacho to talk about what’s really holding back AI adoption in engineering teams. It’s not the technical challenges—it’s fear, unclear expectations, and the disconnect between hype and reality. Laura shares practical strategies for making progress, from modeling usage at the leadership level to creating space for experimentation. We also talk about how to measure impact effectively—including why it’s critical to establish a baseline before introducing AI tools, so you can track real changes over time.

Some takeaways:

Obstacles to AI adoption

The hype: Some of the claims written on LinkedIn and other places online overstate the impact of AI coding tools.
The cost of AI tools.
Technical barriers aren’t holding back adoption—cultural and human resistance are.

Some AI adoption stats

Top-end organizations report that 60-70% of developers are using code assistance either daily or weekly.
Less than 60% of code is written with AI at Microsoft.
There is a steady increase in the adoption of AI tools.

Strategies for driving AI adoption

Work against the hype by showing what the real impact of AI tools is. Methods for demonstrating impact include: self-reported, telemetry-based, direct signals from developers, indirect signals using hard data, and quality measures.
There isn’t a special set of metrics for AI adoption: We still care about quality, developer experience, and business impact.
Model and encourage AI use from the top down.
Remove the fear and stigma from using AI: Make employees reassured that using AI isn’t cheating.
Carve out time for employees to experiment with AI tools.
Have clear conversations around expectations around AI use.
Host webinars, in-person trainings, and office hours.
Champions programs: Identify the early adopters to help evangelize and drive adoption with their peers.
Use DX’s Guide to AI assisted engineering
DORA’s recent AI report shows a 451% increase in adoption among companies with an acceptable use policy.

Key measures from the DX Core 4 productivity framework

For quality: Measure change failure rate before and after introducing AI tools.
For speed: Measure PR throughput.
For quality: Survey developers. Many report improved code readability at companies with high AI adoption.
Always stay on top of developer experience.

In this episode, we cover:

(00:00) Intro: The full spectrum of AI adoption

(03:02) The hype of AI

(04:46) Some statistics around the current state of AI coding tool adoption

(07:27) The real barriers to AI adoption

(09:31) How to drive AI adoption

(15:47) Measuring AI’s impact

(19:49) More strategies for driving AI adoption

(23:54) The Methods companies are actually using to drive impact

(29:15) Questions from the chat

(39:48) Wrapping up

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• Website: https://lauratacho.com/

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

Referenced:

DORA’s latest research on AI impact

Abi Noda — Fri, 23 May 2025 15:29:30 GMT

Listen and watch now on YouTube, Apple, and Spotify.

DORA just released a new report on the impact of generative AI on software development productivity, and the findings were a mix of expected and surprising.

On today’s episode, I’m joined once again by Derek DeBellis, lead researcher on Google’s DORA team, to break down the key insights. We talk about how the survey was developed, why measuring productivity is so complex, and what the data actually tells us about how AI is affecting real-world teams.

Some takeaways:

The research design process

How DORA designed survey questions to validate hypotheses, including the careful iteration of single words.
An explanation of “flow” and why the DORA team decided to use a general definition of flow for the survey
How dev time was broken into two buckets: time spent on toilsome work vs. valuable work—and what those self-reported numbers reveal.
If accuracy matters, measure time as close to the present as possible—same day or “right now”.
You don’t need perfect accuracy to learn a lot from data.
How DORA’s definition of productivity is inspired by the book Slow Productivity, and tied to value creation.

Positive findings from the Gen AI report

Productivity is likely to increase by 2.1% when individual AI adoption is increased by 25%—about 20 minutes.
How a 2.1% increase in productivity per individual could be dramatic across an entire organization.

Interesting and surprising findings from the report

The contradictory finding that the time spent doing valuable work went down with AI, but this is easily explained by AI making the work go faster, not actually reducing the amount of valuable work.
While AI increases personal productivity (e.g., 2.1% boost), it correlates with a 1.5% drop in delivery throughput and a 7.2% decline in stability. The good news is that some of this could be only a short-term period of working with new constraints.

Guidance for measuring AI’s impact on productivity

Make sure that all metrics are aligned with your organization’s goals and multi-faceted as well.
Don’t fall into the common trap of overestimating how much AI tools boost productivity.
Use discrepancies between the current state and goals to inform strategy.
Beware of short-term disorientation when adopting new tools.

In this episode, we cover:

(00:00) Intro: DORA’s new Impact of Gen AI report

(03:24) The methodology used to put together the surveys DORA used for the report

(06:44) An example of how a single word can throw off a question

(07:59) How DORA measures flow

(10:38) The two ways time was measured in the recent survey

(14:30) An overview of experiential surveying

(16:14) Why DORA asks about time

(19:50) Why Derek calls survey results ‘observational data’

(21:49) Interesting findings from the report

(24:17) DORA’s definition of productivity

(26:22) Why a 2.1% increase in individual productivity is significant

(30:00) The report’s findings on decreased team delivery throughput and stability

(32:40) Tips for measuring AI’s impact on productivity

(38:20) Wrap up: understanding the data

Where to find Derek DeBellis:

• LinkedIn: https://www.linkedin.com/in/derekdebellis/

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

Referenced:

Setting targets for developer productivity metrics

Abi Noda — Fri, 09 May 2025 13:37:56 GMT

Listen and watch now on YouTube, Apple, and Spotify.

In today’s episode, I’m joined by Laura Tacho, CTO at DX, engineering leadership coach, and creator of the Core 4 framework.

We discuss why many frameworks—such as SPACE, DORA, and even Core 4—can go awry when teams focus on the wrong metrics. We explain how to get started with Core 4 by anchoring on controllable inputs rather than arbitrary targets. Whether you're just beginning with metrics or trying to course-correct a bloated system, this is a sharp, practical guide to implementing Core 4 in a way that actually works.

Some Takeaways:

Common pitfalls with engineering metrics frameworks

Frameworks like SPACE, DORA, and Core 4 often fail when implemented without clear context.
Teams sometimes adopt metrics haphazardly, setting targets that don’t align with their specific challenges.
Some metrics may be irrelevant for certain teams—or easily gamed.
Metric overwhelm is common: tracking too many metrics leads to confusion and dilution of focus.
Leaders often fail to communicate why specific metrics are chosen or how they tie to business goals.

Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”

Teams may distort or manipulate metrics to hit arbitrary goals.
Example: Reducing bug counts as a target can lead to underreporting rather than real quality improvement.

Input vs. output metrics

Input metrics are actions teams can directly control (e.g., time spent on new capabilities).
Output metrics (e.g., delivery speed, reliability) are influenced by many factors and harder to manage directly.
Focusing on inputs gives teams a clearer path to improvement without encouraging unhealthy behaviors.

How to implement Core 4 well

Start small—track a few metrics first before expanding.
Don’t set targets until you’ve established a baseline and understand what’s realistically controllable.
Aim for the 75th percentile to push for improvement while avoiding unrealistic pressure.
Use metrics to create a culture of reflection and continuous improvement, not judgment.

How to avoid gamification

Use multidimensional metrics to avoid tunnel vision.
Focus on input metrics.
Reward effort, learning, and progress—not just hitting numeric goals.
Include teams in the goal-setting process to increase buy-in and reduce manipulation.
Give teams space and time to make real progress.

In this episode, we cover:

(00:00) Intro: Improving systems, not distorting data

(02:20) Goal setting with the new Core 4 framework

(08:01) A quick primer on Goodhart’s law

(10:02) Input vs. output metrics—and why targeting outputs is problematic

(13:38) A health analogy demonstrating input vs. output

(17:03) A look at how the key input metrics in Core 4 drive output metrics

(24:08) How to counteract gamification

(28:24) How to get developer buy-in

(30:48) The number of metrics to focus on

(32:44) Helping leadership and teams connect the dots to how input goals drive output

(35:20) Demonstrating business impact

(38:10) Best practices for goal setting

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• Website: https://lauratacho.com/

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

Referenced:

The AI adoption playbook: Lessons from Microsoft's internal strategy

Abi Noda — Fri, 18 Apr 2025 14:50:05 GMT

Listen and watch now on YouTube, Apple, and Spotify.

Brian Houck from Microsoft returns to discuss effective strategies for driving AI adoption among software development teams. Brian shares his insights into why the immense hype around AI often serves as a barrier rather than a facilitator for adoption, citing skepticism and inflated expectations among developers. He highlights the most effective approaches, including leadership advocacy, structured training, and cultivating local champions within teams to demonstrate practical use cases.

Brian emphasizes the importance of honest communication about AI's capabilities, avoiding over-promises, and ensuring that teams clearly understand what AI tools are best suited for. Additionally, he discusses common pitfalls, such as placing excessive pressure on individuals through leaderboards and unrealistic mandates, and stresses the importance of framing AI as an assistant rather than a replacement for developer skills. Finally, Brian explores the role of data and metrics in adoption efforts, offering practical advice on how to measure usage effectively and sustainably.

Some Takeaways:

Barriers to AI Adoption

The biggest obstacle to AI adoption in engineering teams is not a lack of tooling or access, but developer skepticism rooted in hype and unrealistic expectations.
30% of developers say their primary concern is that AI tools will not deliver on their promises, which discourages initial or continued use.
When developers try an AI tool and don’t immediately experience transformational results, they often abandon it entirely.
Broader narratives—like AI replacing developers—add to the skepticism, even though only about 10% of developers are actually concerned about job displacement.

The Role of Leadership

Leadership advocacy is a powerful lever for AI adoption. Developers are seven times more likely to be daily users when leaders actively promote and normalize the use of AI tools.
Leaders should clearly communicate which AI tools are approved, what kinds of tasks they’re suitable for, and that developers are encouraged (and expected) to use them.
Messaging that overstates the capabilities of AI tools tends to backfire, creating disappointment and further resistance.
Ongoing communication—rather than one-off announcements—is essential to build trust and maintain momentum.

Team-Level Strategies and Local Champions

Adoption is more successful when it’s supported at the team level through peer learning, not just centralized rollouts.
Developers benefit most when “local champions”—respected, experienced team members—show how they are using AI tools in the context of real workflows.
Organizations that rely on local champions for internal knowledge-sharing see about 22% greater adoption among developers.
Brown bag sessions, peer demos, and informal walkthroughs are especially effective for making AI adoption relevant and practical.
Managers play a key role in identifying early adopters who are both enthusiastic and influential within their teams.

Measurement and Data-Driven Adoption

Successful adoption efforts rely on clear, ongoing measurement—tracking who has installed AI tools, who is actively using them, and how frequently.
Rather than boiling engagement down to a single score, Brian recommends tracking usage at multiple tiers: daily, weekly, monthly, and lapsed users.
Dashboards and scorecards help leaders visualize progress and encourage healthy, team-level competition.
Monitoring usage across groups—not individuals—helps foster psychological safety while still promoting accountability and improvement.

Common Pitfalls and Developer Concerns

Public leaderboards or pressuring individual developers to adopt AI tools can lead to disengagement or mistrust.
Developers often worry that AI-generated code may introduce bugs, vulnerabilities, or degrade quality. This concern is second only to the fear that AI is overhyped.
Some developers fear their skills may atrophy if they rely too heavily on AI; this highlights the need to reframe AI as a partner in development, not a replacement.
Organizations should address these concerns head-on through honest, nuanced messaging that balances enthusiasm with realism.

Long-Term Impact and Ongoing Research

Initial studies show AI tools can improve code-writing efficiency by 5% to 30%, but broader productivity gains are harder to measure due to the multifaceted nature of software engineering work.
Brian highlights the need for more research on how AI impacts long-term code maintainability, especially when it accelerates initial development phases.
Developers still spend only about 14% of their time writing code, so increasing throughput in that slice doesn’t automatically translate into overall productivity gains.

In this episode, we cover:

(00:00) Intro: Why AI hype can hinder adoption among teams

(01:47) Key strategies companies use to successfully implement AI

(04:47) Understanding why adopting AI tools is uniquely challenging

(07:09) How clear and consistent leadership communication boosts AI adoption

(10:46) The value of team leaders ("local champions") demonstrating practical AI use

(14:26) Practical advice for identifying and empowering team champions

(16:31) Common mistakes companies make when encouraging AI adoption

(19:21) Simple technical reminders and nudges that encourage AI use

(20:24) Effective ways to track and measure AI usage through dashboards

(23:18) Working with team leaders and infrastructure teams to promote AI tools

(24:20) Understanding when to shift from adoption efforts to sustained use

(25:59) Insights into the real-world productivity impact of AI

(27:52) Discussing how AI affects long-term code maintenance

(29:02) Updates on ongoing research linking sleep quality to productivity

Where to find Brian Houck:

• LinkedIn: https://www.linkedin.com/in/brianhouck/

• Website: https://www.microsoft.com/en-us/research/people/bhouck/

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

Referenced: