DORA Metrics & DevEx — Street-Level Ops¶
Quick Diagnosis Commands¶
# Count production deploys in last 7 days (GitHub Actions)
gh run list \
--repo myorg/myapp \
--workflow=deploy-production.yml \
--json conclusion,createdAt,headBranch \
--limit 100 \
| jq '[.[] | select(.conclusion=="success" and .headBranch=="main")] | length'
# Estimate average CI duration (GitHub Actions)
gh run list \
--repo myorg/myapp \
--workflow=ci.yml \
--json createdAt,updatedAt,conclusion \
--limit 50 \
| jq '[.[] | select(.conclusion=="success") | (.updatedAt | fromdateiso8601) - (.createdAt | fromdateiso8601)] | add/length/60 | "avg_ci_minutes: \(.)"'
# Count incidents in last 30 days (PagerDuty API)
curl -s "https://api.pagerduty.com/incidents?statuses[]=resolved&since=$(date -d '30 days ago' -Iseconds)&until=$(date -Iseconds)&limit=100" \
-H "Authorization: Token token=$PD_TOKEN" \
-H "Accept: application/vnd.pagerduty+json;version=2" \
| jq '.total'
# Git: count merges to main in last 30 days (proxy for deploy frequency if you ship on merge)
git log --merges --since="30 days ago" --oneline origin/main | wc -l
# PR size distribution (rough lead-time indicator)
gh pr list --repo myorg/myapp --state merged --limit 50 --json additions,deletions,createdAt,mergedAt \
| jq '.[] | {additions, deletions, lead_time_hours: ((.mergedAt | fromdateiso8601) - (.createdAt | fromdateiso8601)) / 3600}'
Gotcha: Measuring Deployment Frequency by Counting CI Runs¶
Your CI runs on every PR. You count CI "successful runs" and call it deployment frequency. Your "Elite" number is actually the number of test runs, not production deploys. You present this to leadership and set a false baseline.
Rule: Deployment frequency counts only successful deploys TO PRODUCTION. Track these via a dedicated event: a webhook call, a Prometheus counter increment, or a deploy log entry. Do not infer deploys from CI runs.
Gotcha: Using P1 Incident Count as Change Failure Rate¶
You calculate change failure rate as P1 incidents / total deploys. But most P1s in your system are infrastructure failures unrelated to code deploys (AWS outage, cert expiration, human error in config). Your CFR shows 40% and the team panics.
Rule: Change failure rate measures incidents CAUSED BY a deploy. Tag incidents with deploy_triggered: true at incident creation time. Correlate: was a deploy made within the hour before the incident opened? Only count those. Pure infrastructure failures are tracked separately as a reliability/SLA metric.
Gotcha: MTTR Clock Starts When the Ticket Is Opened, Not When the Incident Started¶
Your oncall engineer notices degraded metrics at 2am, investigates for 30 minutes, confirms it's an incident, THEN opens the ticket at 2:30am. The ticket is resolved at 3am. MTTR = 30 minutes. Actual time users were affected: 90 minutes.
Rule: MTTR should start from when the service degraded, not when the ticket was opened. Use alert firing time as the start, or better, use the time your error rate SLO started burning. Many incident management platforms (PagerDuty, Grafana OnCall) capture alert trigger time — use that.
Pattern: Lightweight DORA Tracking With a Postgres Table¶
For teams without a dedicated DORA tool, a simple table + cron query covers 80% of the value:
-- Schema
CREATE TABLE deploy_events (
id SERIAL PRIMARY KEY,
sha VARCHAR(40) NOT NULL,
repo VARCHAR(200) NOT NULL,
environment VARCHAR(50) NOT NULL,
deployed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deployed_by VARCHAR(100),
status VARCHAR(20) NOT NULL DEFAULT 'success', -- success | rollback | hotfix
caused_incident BOOLEAN DEFAULT FALSE,
incident_id VARCHAR(100),
lead_time_seconds INTEGER -- seconds from commit to deploy
);
-- Deployment frequency (last 4 weeks, by week)
SELECT
DATE_TRUNC('week', deployed_at) AS week,
COUNT(*) FILTER (WHERE status = 'success') AS deploys,
ROUND(COUNT(*) FILTER (WHERE status = 'success')::NUMERIC / 7, 1) AS deploys_per_day
FROM deploy_events
WHERE environment = 'production'
AND deployed_at >= NOW() - INTERVAL '28 days'
GROUP BY 1 ORDER BY 1;
-- Lead time percentiles
SELECT
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY lead_time_seconds)/3600 AS p50_hours,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY lead_time_seconds)/3600 AS p95_hours
FROM deploy_events
WHERE environment = 'production' AND status = 'success'
AND deployed_at >= NOW() - INTERVAL '30 days';
-- Change failure rate
SELECT
ROUND(100.0 * COUNT(*) FILTER (WHERE caused_incident) / COUNT(*), 1) AS cfr_pct,
COUNT(*) FILTER (WHERE caused_incident) AS failures,
COUNT(*) AS total
FROM deploy_events
WHERE environment = 'production'
AND deployed_at >= NOW() - INTERVAL '30 days';
Emit to this table from your deploy script:
#!/bin/bash
# deploy.sh — called from CI after successful production deploy
SHA=${GIT_SHA:-$(git rev-parse HEAD)}
DEPLOY_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
COMMIT_TIME=$(git log -1 --format=%cI $SHA)
LEAD_TIME=$(( $(date -d "$DEPLOY_TIME" +%s) - $(date -d "$COMMIT_TIME" +%s) ))
psql $METRICS_DB_URL <<EOF
INSERT INTO deploy_events (sha, repo, environment, deployed_at, deployed_by, status, lead_time_seconds)
VALUES ('$SHA', '$REPO', 'production', '$DEPLOY_TIME', '$GITHUB_ACTOR', 'success', $LEAD_TIME);
EOF
Pattern: Flaky Test Tracker¶
Flaky tests are a DevEx killer — they create noise in CI, erode trust in test results, and add lead time through spurious retries.
#!/bin/bash
# track-flaky-tests.sh
# Run after each CI suite and parse JUnit XML for test results
JUNIT_XML=$1
SUITE_NAME=$2
RUN_AT=$(date -u +%Y-%m-%dT%H:%M:%SZ)
python3 - <<EOF
import xml.etree.ElementTree as ET
import psycopg2, os
tree = ET.parse("$JUNIT_XML")
root = tree.getroot()
conn = psycopg2.connect(os.environ["METRICS_DB_URL"])
cur = conn.cursor()
for testcase in root.iter("testcase"):
name = testcase.attrib.get("classname", "") + "." + testcase.attrib.get("name", "")
failed = testcase.find("failure") is not None or testcase.find("error") is not None
cur.execute(
"INSERT INTO test_runs (test_name, suite, status, ran_at) VALUES (%s, %s, %s, %s)",
(name, "$SUITE_NAME", "failed" if failed else "passed", "$RUN_AT")
)
conn.commit()
EOF
# Report top flaky tests (>5% failure rate, at least 20 runs)
psql $METRICS_DB_URL <<EOF
SELECT test_name,
COUNT(*) runs,
ROUND(100.0 * SUM(CASE WHEN status='failed' THEN 1 ELSE 0 END) / COUNT(*), 1) flake_pct
FROM test_runs
WHERE ran_at >= NOW() - INTERVAL '14 days'
GROUP BY 1
HAVING COUNT(*) >= 20
AND SUM(CASE WHEN status='failed' THEN 1 ELSE 0 END)::NUMERIC / COUNT(*) BETWEEN 0.05 AND 0.95
ORDER BY flake_pct DESC
LIMIT 10;
EOF
Scenario: Team Presents "Elite" DORA Score But Reliability Is Terrible¶
The team shows Deployment Frequency = 15/day, Lead Time = 45 min. They claim Elite. But users are complaining about constant bugs, and SREs are running ragged with incidents.
Investigation:
# Check the other two metrics they didn't mention
# Query change failure rate
psql $METRICS_DB <<'EOF'
SELECT
ROUND(100.0 * COUNT(*) FILTER (WHERE caused_incident) / COUNT(*), 1) AS cfr_pct
FROM deploy_events
WHERE environment = 'production'
AND deployed_at >= NOW() - INTERVAL '30 days';
EOF
# Result: 45% — well above the 15% Elite threshold
# Check MTTR
psql $METRICS_DB <<'EOF'
SELECT AVG(EXTRACT(EPOCH FROM (resolved_at - started_at))/3600) AS avg_mttr_hours
FROM incidents WHERE severity = 'P1' AND started_at >= NOW() - INTERVAL '30 days';
EOF
# Result: 6.3 hours — far from Elite's < 1 hour
Finding: The team optimized throughput at the expense of stability. High deploy frequency and low lead time are meaningless if every third deploy causes an incident that takes 6 hours to fix. The correct diagnosis: improve testing, implement canary deploys, build runbooks.
Scenario: DORA Metrics Are Good on Paper but Devs Are Miserable¶
Surveys show developer satisfaction at 40%. The DORA dashboard shows Elite. Something doesn't add up.
Investigation: DORA measures the pipeline, not the human experience. What's missing: - Local build takes 18 minutes - CI environment is flaky (30% of runs require manual retry) - No documentation for onboarding - Code review turnaround is 4+ days - Engineers attend 30+ hours of meetings per week
Approach: Run the SPACE survey or use the DORA DevEx supplement questions. Identify the top 3 DevEx complaints from the team. These are the metric-invisible friction points.
# Proxy metric: CI retry rate (rerun without code change = flaky CI)
gh run list --repo myorg/myapp --limit 200 --json databaseId,conclusion \
| jq '[.[] | select(.conclusion=="failure")] | length'
# Compare to total runs — high failure rate on CI = suspect flakiness
# Meeting time (requires calendar data)
# Most devex platforms (DX, Pluralsight Flow, LinearB) track this directly
# Code review latency
gh pr list --state merged --limit 100 --json createdAt,mergedAt,reviewDecision \
| jq '[.[] | {
hours_open: ((.mergedAt | fromdateiso8601) - (.createdAt | fromdateiso8601)) / 3600
}] | map(.hours_open) | add/length'
Useful One-Liners¶
# Count deploys to production this week (GitHub Actions)
gh run list --workflow=deploy.yml --json conclusion,createdAt \
| jq --arg since "$(date -d 'monday' -Iseconds)" \
'[.[] | select(.conclusion=="success" and .createdAt > $since)] | length'
# DORA band classification from deployment frequency
DEPLOYS_PER_DAY=3.5
python3 -c "
d = $DEPLOYS_PER_DAY
if d >= 1: print('Elite (on-demand, multiple/day)')
elif d >= 1/7: print('High (weekly)')
elif d >= 1/30: print('Medium (monthly)')
else: print('Low (<monthly)')
"
# Lead time: time from last commit on PR to merge
gh pr view 123 --json commits,mergedAt --repo myorg/myapp \
| jq '{ lead_h: ((.mergedAt | fromdateiso8601) - (.commits[-1].committedDate | fromdateiso8601)) / 3600 }'
# MTTR from PagerDuty: resolved incidents last 30d
curl -s "https://api.pagerduty.com/incidents?statuses[]=resolved&limit=100&since=$(date -d '30 days ago' -Iseconds)" \
-H "Authorization: Token token=$PD_TOKEN" \
-H "Accept: application/vnd.pagerduty+json;version=2" \
| jq '[.incidents[] | ((.resolved_at | fromdateiso8601) - (.created_at | fromdateiso8601)) / 60] | add/length | "avg MTTR: \(.) minutes"'
# Quick DevEx signal: what % of CI runs are retries?
gh run list --limit 200 --json conclusion --repo myorg/myapp \
| jq '{total: length, failed: [.[] | select(.conclusion=="failure")] | length} | .failed/.total * 100 | "failure rate: \(.)%"'
# Git blame for recent large commits (big commits = risky changes = high CFR potential)
git log --since="30 days ago" --format="%H %s" | while read sha msg; do
lines=$(git show --stat $sha | tail -1 | awk '{print $4}')
echo "$lines $sha $msg"
done | sort -rn | head -10