➜ grep
$ cat neuro-coder-part-5-beyond-dora.md
πŸ“‰πŸ“Š The Neuro-Coder (Part 5): Beyond DORA (Why Velocity is Meaningless)

πŸ“‰πŸ“Š The Neuro-Coder (Part 5): Beyond DORA (Why Velocity is Meaningless)

This is Part 5 of The Neuro-Coder. In Part 4, we looked at the human cost. Now, we look at the business cost.


If you are an Engineering Manager using “Velocity” or “Lines of Code” to measure your team in 2026, stop.

You are measuring noise.

For decades, we’ve used proxies for productivity. We counted tickets closed. We counted commits. In a world where writing code was the constraint, these were… okay proxies.

But AI has removed the constraint. Writing code is now essentially free. A junior developer with an AI coding assistant can generate 10,000 lines of code in an afternoon. If you measure output, that junior is your “Top Performer.”

In reality, they might be your biggest liability.

In my recent research on developer psychology, engineers explicitly rejected “writing code by the kilogram.” They are shifting their definition of productivity from Output (Velocity) to Outcome (Value). Yet, our metrics haven’t caught up.

1. The GitClear Warning: The Rise of “Code Churn”

The data backs this up. In 2024, GitClear analyzed 150 million lines of code. They found a disturbing trend:

“Code Churn”β€”lines of code that are written, pushed, and then deleted or rewritten within 2 weeksβ€”is skyrocketing.

The Counter-Argument: Some argue this is just “Rapid Prototyping.” If code is cheap, maybe we should delete more of it?

The Reality: We need to distinguish between Exploratory Churn (good experimentation) and Correctional Churn (bad rework). The GitClear data suggests we are seeing the latter: a downward pressure on code quality. We aren’t iterating; we are fixing mistakes we shouldn’t have made.

We are experiencing “Inflation” in software. There is more code, but it’s worth less. We are shipping “Bloatware” because it’s easier to add new AI lines than to refactor existing human lines. Code Reuse is dropping because AI doesn’t know your internal libraries; it just invents new ones.

2. The DORA Paradox: Balancing Throughput and Instability

We love DORA metrics. They are the gold standard because they take a high-level view of the entire delivery process, focusing on two key factors:

  • Throughput: How many changes move through the system (Velocity).
  • Instability: How often those changes break things (Quality).

Taken together, these factors give teams a high-level understanding of their software delivery performance.

The 5 Core DORA Metrics

In the AI era, we must track all five metrics to see the full picture:

  1. Lead time for changes: The amount of time it takes for a change to go from committed to version control to deployed in production.
  2. Deployment frequency: The number of deployments over a given period.
  3. Change fail rate: The ratio of deployments that require immediate intervention (hotfix/rollback).
  4. Failed deployment recovery time: The time it takes to recover from a failure.
  5. Rework rate: The ratio of unplanned deployments caused by an incident.

The “Goodhart” Trap

Goodhart’s Law states: “When a measure becomes a target, it ceases to be a good measure.”

In the AI era, metrics like Lead Time are becoming easy targets. AI drives Throughput to record highs, but it is silently increasing Instability. If your Deployment Frequency doubles but your Rework Rate also spikes, you haven’t improved; you’ve just traded quality for speed.

3. Beyond Metrics: A Framework for Measurement

DORA’s latest guidance emphasizes that metrics alone aren’t enough. We need a Measurement Frameworkβ€”a structured way to link our data to our goals. We shouldn’t just count things because we can; we should measure what matters.

This typically follows a Goals -> Signals -> Metrics approach:

  • Goal: “Improve Developer Efficiency without burning out the team.”
  • Signal: “Developers feel confident in the code they ship.”
  • Metric: “Rework Rate” or “Code Review Sentiment.”

To build this framework in an AI-heavy environment, we should pay close attention to the quality signals that expose “Vibe Coding.”

A. Rework Rate (The Stability Proxy)

What percentage of PRs require significant changes before merging?

While “Reliability” (uptime) is the ultimate measure, it is a lagging indicator. Rework Rate is our leading indicator. If your churn is doubling, you have a “Vibe Coding” problem. Developers are accepting AI garbage and hoping the reviewer catches it.

B. Review Density (The Burnout Metric)

How many comments/iterations does a PR generate? If your PR comments are skyrocketing, you are burning out your Seniors. You are paying a high-latency “review tax” on every line of AI code.

C. The Cognitive Load Index

Stop guessing. Survey your team. DORA research highlights that User Satisfaction (how the developer feels) is a leading indicator of technical performance.

  • “How much of your time is spent debugging code you didn’t write?”
  • “Do you understand the code you shipped yesterday?”

If the answer to the second question is “No,” you have a crisis of ownership, regardless of your Deployment Frequency.

β˜• The Takeaway: Problems Solved > Lines Written

The most successful teams in the AI era won’t be the ones who write the most code. They will be the ones who write the least code to solve the problem.

AI wants to be verbose. It wants to give you 50 lines when 5 would do. Your job as a leader is to value the Net Negative Line Count. Celebrate the deletion. Celebrate the elegance.

In the final part of this series, Part 6, we’ll look at the solution. How do we redesign our workflowβ€”and our IDEsβ€”to regain control? We’ll explore the philosophy of “Slow AI”.


πŸ”¬ The Hypothesis & The Request

This post proposes the following hypothesis:

Hypothesis: In an era where code generation is free, traditional DORA metrics like Deployment Frequency are susceptible to inflation. We must shift to Resilience Metrics (Rework Rate, Cognitive Load) to measure true health.

Research Question I’d Love to See Answered:

Correlation studies. Is there a link between high “AI Code Generation Rate” and “Production Incident Rate” over a 12-month period? Does shipping faster with AI actually lead to breaking more things?


πŸ“š Further Reading