Engineering Enablement
Engineering Enablement Podcast
Measuring AI code assistants and agents with the AI Measurement Framework
0:00
-41:13

Measuring AI code assistants and agents with the AI Measurement Framework

A data-driven guide to measuring developer productivity in the AI era, cutting through hype to track real impact and ensure sustainable gains.

Listen and watch now on YouTube, Apple, and Spotify.

In this episode of Engineering Enablement, DX CEO Abi Noda and I share practical guidance for measuring developer productivity in the AI era using our AI Measurement Framework. Based on research with industry leaders, vendors, and hundreds of organizations, we walk through how to cut through the hype and make informed decisions about AI adoption.

We talk about which fundamentals of productivity measurement remain unchanged, why metrics like acceptance rate can be misleading, and how to track AI’s real impact across utilization, quality, and cost. We also cover how to measure agentic workflows, expand the definition of “developer” to include AI-enabled contributors, and identify second-order effects before they create long-term problems.

If you’re introducing AI coding tools, exploring autonomous agents, or just trying to separate signal from noise, this episode offers a clear, actionable roadmap for using data to ensure AI delivers sustainable, meaningful gains.

Some takeaways:

AI’s hype vs. reality gap

  • Bold headlines are often misleading. Claims like “90% of code will be written by AI” typically come from cherry-picked studies in narrow scenarios, not representative of the median developer experience.

  • Organizations need their own data. Vendor marketing and public research can set unrealistic expectations—measuring AI’s real-world impact in your own environment is the only way to guide strategy and investment.

AI doesn’t change engineering fundamentals

  • Core principles remain the same. Scalability, maintainability, reliability, and meeting customer needs still define good engineering.

  • AI builds on—not replaces—these foundations. Use AI to lift existing strengths, not as an excuse to rebuild productivity measurement from scratch.

The AI Measurement Framework

  • Three dimensions matter most: utilization (how widely AI is used), impact (how it changes performance), and cost (what you spend on tools, licenses, training).

  • Track them together for the full picture. Over-indexing on one—like utilization—can lead to false conclusions about overall value.

The pitfalls of acceptance rate

  • Acceptance rate is unreliable. AI code that’s accepted in the IDE is often rewritten, heavily modified, or deleted before shipping.
    Better options exist. Tagging PRs for AI contributions or using file-level observability can identify AI-authored changes across all IDEs and tools, avoiding blind spots.

Collecting measurement data

  • Tool telemetry (from GitHub, GitLab, or AI vendors) shows patterns in daily and weekly adoption.

  • Quarterly surveys reveal long-term trends in developer satisfaction, productivity, and maintainability perceptions.

  • In-workflow experience sampling asks targeted questions at the moment of work—e.g., “Was this PR authored with AI?”—to get precise, low-bias feedback.

Perception vs. reality in time savings

  • Developers often feel faster with AI—but logs may say otherwise. A meta-study found that self-reports overstated gains; in some cases, AI users were slower.

  • Triangulate survey and system data to confirm that perceived improvements match actual throughput and quality metrics.

Measuring agentic workflows

  • Treat agents as team extensions, not digital employees. Measure productivity for the combined human-agent team, just as you would for a team using CI/CD tools like Jenkins.

  • Focus on maturity, not just usage. There’s a big difference between using AI for autocomplete and delegating multi-step tasks to autonomous loops.

Expanding the definition of developer

  • AI enables more contributors. Designers, PMs, and other non-engineers can now produce functional code and prototypes.
    Apply the same quality gates—code review, testing, maintainability checks—to their contributions as to full-time engineers.

Thinking beyond AI

  • AI is one tool in the toolbox. Many bottlenecks—like unclear requirements, inefficient processes, and infrastructure delays—can’t be solved by AI alone.

  • Balance investment to ensure you’re addressing all productivity levers, not just AI adoption.

Watching for second-order effects

  • More AI-generated code can create new bottlenecks. Extra output can slow PR reviews, increase cognitive load, and lower maintainability.

  • Impact metrics reveal trade-offs early, helping you prevent short-term speed gains from causing long-term technical debt.

Rolling out metrics successfully

  • Aggregate at team or department level. Avoid individual tracking to build trust and reduce fear around AI adoption.

  • Be transparent about data use so developers know it’s for enablement, tool evaluation, and rollout strategy—not performance surveillance.

In this episode, we cover:

(00:00) Intro

(01:26) The challenge of measuring developer productivity in the AI age

(04:17) Measuring productivity in the AI era — what stays the same and what changes

(07:25) How to use DX’s AI Measurement Framework

(13:10) Measuring AI’s true impact from adoption rates to long-term quality and maintainability

(16:31) Why acceptance rate is flawed — and DX’s approach to tracking AI-authored code

(18:25) Three ways to gather measurement data

(21:55) How Google measures time savings and why self-reported data is misleading

(24:25) How to measure agentic workflows and a case for expanding the definition of developer

(28:50) A case for not overemphasizing AI’s role

(30:31) Measuring second-order effects

(32:26) Audience Q&A: applying metrics in practice

(36:45) Wrap up: best practices for rollout and communication

Where to find Laura Tacho:

• LinkedIn: https://www.linkedin.com/in/lauratacho/

• Website: https://lauratacho.com/

Where to find Abi Noda:

• LinkedIn: https://www.linkedin.com/in/abinoda

• Substack: ​​https://substack.com/@abinoda

Referenced:

Discussion about this episode

User's avatar