Skip to content

DORA Metrics & DevEx Footguns

Mistakes that cause gaming, misinterpretation, or wasted effort with DORA metrics and DevEx programs.


1. Optimizing One Metric at the Expense of the Others

The team is told to improve Deployment Frequency. Engineers ship tiny, trivial commits daily (typo fixes, whitespace changes, comment edits) to boost the count. Lead time drops, deploy frequency spikes. Change failure rate climbs because nobody slowed down to write tests for the real features. MTTR climbs because each deploy is now too trivial to justify a runbook.

DORA metrics are a system. Improving one while ignoring the others is Goodhart's Law in action: when a measure becomes a target, it ceases to be a good measure.

Fix: Track all four metrics together. Set thresholds on all four before reporting a team as "Elite." Any improvement in one metric that degrades another is a net loss.


2. Using DORA as a Performance Review Tool for Individual Engineers

Management attaches DORA metrics to engineer performance reviews. Engineers discover which of their commits triggered incidents and suppress incident reports to protect their CFR score. Others pad deploy counts with trivial changes. MTTR inflates because nobody wants to be "the person who caused the incident."

Fix: DORA metrics are team-level and system-level measurements. Never attach them to individual engineer performance. The moment engineers are measured individually, they game the metrics and the data becomes worthless. Explicitly communicate this from the start of any DORA program.


3. Measuring Deployment Frequency Without Filtering Environments

You count all successful CI pipeline runs (dev, staging, production) as "deploys." Your frequency looks Elite. In reality, production deploys happen once per week and staging runs hundreds of times per week for automated testing.

Fix: Deployment frequency counts ONLY production deploys. Be precise about what "production" means: the environment customers use. If you have multiple production shards, count deploys to any of them, but do not count dev, staging, or test environment deployments.


4. Calculating Lead Time From PR Open, Not From Commit

Lead time for changes should measure from when a commit is made (or when work is complete) to when it's running in production. Some teams measure from PR open date. If a PR sits in draft for 5 days before being opened for review, those 5 days of development time are invisible. The metric looks better than reality.

Fix: Measure lead time from the earliest commit on the merged PR (or from the PR open date if commits are always pushed when the PR opens). For strictest accuracy: measure from git commit timestamp of the first commit in the branch.


5. Ignoring the Detection Gap in MTTR

An incident starts at 2am. Your monitoring doesn't alert (alert thresholds too loose, or no on-call during that hour). The on-call engineer sees the ticket at 9am and resolves it at 9:45am. You record MTTR as 45 minutes. Actual user-facing downtime was 7 hours 45 minutes.

Fix: MTTR must start from when the service degraded (SLO burn began, error rate spiked, or first user complaint), not from when an engineer acknowledged the alert. Use your monitoring system's alert firing time as the start. Review and correct MTTR calculations in post-mortems.


6. Treating "Change Failure Rate" and "Incident Rate" as Synonyms

Your infrastructure has periodic failures unrelated to code changes: cloud provider outages, hardware failures, certificate expirations, resource exhaustion. You count all P1 incidents in your CFR denominator. The team's CFR climbs because AWS had a bad month. They spend their retrospective blaming themselves for incidents they didn't cause.

Fix: Change Failure Rate = (incidents caused by a deploy) / (total deploys). Distinguish deploy-triggered incidents from infrastructure incidents. Tag incidents at creation: was there a deploy in the preceding hour? Was the incident caused by the deploy or by an external factor? Keep these separate in your metrics store.


7. Measuring DORA Once and Declaring a Baseline

You measure all four metrics for the first time, present them to leadership, and declare it the baseline. Six months later, you measure again and compare. But the first measurement was taken during a sprint when the team was shipping a major refactor (long PRs, careful review, low deploy frequency). The second measurement was taken during a bug-fix sprint (many small PRs, fast deploys). The comparison is apples to oranges.

Fix: DORA metrics need continuous measurement, not point-in-time snapshots. Use a rolling 4-week or 12-week window to smooth out sprint-to-sprint variation. Track trends, not absolute values. Set up automated data collection so metrics are always current.


8. Conflating Improvements in Metrics With Improvements in Outcomes

Deployment frequency doubled. The team celebrates. But customer-reported bugs are up 30%, and the SLA is being missed. The metrics improved but the thing the metrics were supposed to predict (business outcomes) got worse.

Fix: DORA metrics predict outcomes — they are not outcomes themselves. Validate that metric improvements correlate with the outcomes you care about: user satisfaction, feature adoption, revenue impact, SLA compliance. If a metric improves but outcomes don't, the metric measurement is probably wrong or the metric isn't the bottleneck.


9. Running a Survey Once and Calling It a DevEx Program

You send out a developer experience survey to the whole company. Response rate is 40%. Results show "CI is too slow" as the #1 complaint. You spend 3 months reducing CI from 25min to 10min. You don't resurvey. Six months later, nobody knows if the improvement stuck or if new problems emerged.

Fix: DevEx surveys need to be recurring (quarterly at minimum). Trending the same question over time is more valuable than absolute scores. Pair survey data with system metrics (actual CI duration) to validate whether perceived improvements match real ones. Close the loop publicly: "You said CI was slow. We fixed it. Here's what changed."


10. Implementing DORA Without Blameless Culture

Your organization has a blame culture. People hide incidents to avoid punishment. Change failure rate appears 0% because incidents are systematically underreported. MTTR appears 15 minutes because people close tickets before the issue is truly resolved. DORA scores look Elite while the actual developer experience is miserable.

Fix: DORA measurement requires psychological safety to be accurate. If engineers fear punishment for incidents, they won't report them honestly. Establish blameless post-mortems BEFORE rolling out DORA measurement. Explicitly state that DORA data will not be used for individual performance review. Build trust that surfacing failures is rewarded, not penalized.


11. Ignoring Lead Time Outliers

Your P50 lead time is 2 hours (Elite). You report this as your lead time. But your P95 is 14 days — there's a tail of PRs that take two weeks from commit to production. These are the risky, high-context changes that are most likely to cause incidents. They're invisible in the average.

Fix: Always report lead time with percentiles: P50, P75, P95. The P95 reveals your worst-case pipeline: the changes nobody wants to review, the refactors that require synchronization with external teams, the data migrations that need a maintenance window. These are your highest-value improvement targets.


12. Measuring the Pipeline, Forgetting the Developer's Local Experience

Your DORA metrics are all Elite. But every new developer takes 3 weeks to get their local environment set up. The docs are outdated. The seed data script is broken. Integration tests only work with specific environment variables that aren't documented anywhere. Onboarding time is 2 months before someone is productive.

Fix: DORA measures the production pipeline, not the development experience. Supplement DORA with: time-to-first-commit for new hires, local build time, local test suite pass rate, and a regular survey asking "what is your biggest daily friction point?" These reveal the friction DORA doesn't capture.