Measuring AI code assistants and agents with the AI Measurement Framework

Engineering Enablement Podcast

0:00

-41:13

Measuring AI code assistants and agents with the AI Measurement Framework

A data-driven guide to measuring developer productivity in the AI era, cutting through hype to track real impact and ensure sustainable gains.

Abi Noda

Aug 15, 2025

Listen and watch now on YouTube, Apple, and Spotify.

In this episode of Engineering Enablement, DX CEO Abi Noda and I share practical guidance for measuring developer productivity in the AI era using our AI Measurement Framework. Based on research with industry leaders, vendors, and hundreds of organizations, we walk through how to cut through the hype and make informed decisions about AI adoption.

We talk about which fundamentals of productivity measurement remain unchanged, why metrics like acceptance rate can be misleading, and how to track AI’s real impact across utilization, quality, and cost. We also cover how to measure agentic workflows, expand the definition of “developer” to include AI-enabled contributors, and identify second-order effects before they create long-term problems.

If you’re introducing AI coding tools, exploring autonomous agents, or just trying to separate signal from noise, this episode offers a clear, actionable roadmap for using data to ensure AI delivers sustainable, meaningful gains.

Some takeaways:

AI’s hype vs. reality gap

Bold headlines are often misleading. Claims like “90% of code will be written by AI” typically come from cherry-picked studies in narrow scenarios, not representative of the median developer experience.
Organizations need their own data. Vendor marketing and public research can set unrealistic expectations—measuring AI’s real-world impact in your own environment is the only way to guide strategy and investment.

AI doesn’t change engineering fundamentals

Core principles remain the same. Scalability, maintainability, reliability, and meeting customer needs still define good engineering.
AI builds on—not replaces—these foundations. Use AI to lift existing strengths, not as an excuse to rebuild productivity measurement from scratch.

The AI Measurement Framework

Three dimensions matter most: utilization (how widely AI is used), impact (how it changes performance), and cost (what you spend on tools, licenses, training).
Track them together for the full picture. Over-indexing on one—like utilization—can lead to false conclusions about overall value.

The pitfalls of acceptance rate

Acceptance rate is unreliable. AI code that’s accepted in the IDE is often rewritten, heavily modified, or deleted before shipping.
Better options exist. Tagging PRs for AI contributions or using file-level observability can identify AI-authored changes across all IDEs and tools, avoiding blind spots.

Collecting measurement data

Tool telemetry (from GitHub, GitLab, or AI vendors) shows patterns in daily and weekly adoption.
Quarterly surveys reveal long-term trends in developer satisfaction, productivity, and maintainability perceptions.
In-workflow experience sampling asks targeted questions at the moment of work—e.g., “Was this PR authored with AI?”—to get precise, low-bias feedback.

Perception vs. reality in time savings

Developers often feel faster with AI—but logs may say otherwise. A meta-study found that self-reports overstated gains; in some cases, AI users were slower.
Triangulate survey and system data to confirm that perceived improvements match actual throughput and quality metrics.

Measuring agentic workflows

Treat agents as team extensions, not digital employees. Measure productivity for the combined human-agent team, just as you would for a team using CI/CD tools like Jenkins.
Focus on maturity, not just usage. There’s a big difference between using AI for autocomplete and delegating multi-step tasks to autonomous loops.

Expanding the definition of developer

AI enables more contributors. Designers, PMs, and other non-engineers can now produce functional code and prototypes.
Apply the same quality gates—code review, testing, maintainability checks—to their contributions as to full-time engineers.

Thinking beyond AI

AI is one tool in the toolbox. Many bottlenecks—like unclear requirements, inefficient processes, and infrastructure delays—can’t be solved by AI alone.
Balance investment to ensure you’re addressing all productivity levers, not just AI adoption.

Watching for second-order effects

More AI-generated code can create new bottlenecks. Extra output can slow PR reviews, increase cognitive load, and lower maintainability.
Impact metrics reveal trade-offs early, helping you prevent short-term speed gains from causing long-term technical debt.

Rolling out metrics successfully

Aggregate at team or department level. Avoid individual tracking to build trust and reduce fear around AI adoption.
Be transparent about data use so developers know it’s for enablement, tool evaluation, and rollout strategy—not performance surveillance.