Skip to content

Pattern: Runbook with No Contacts

ID: FP-050 Family: Human Error Amplifier Frequency: Very Common Blast Radius: Incident response delay Detection Difficulty: Obvious (during incident)

The Shape

A runbook for a critical operation says "escalate to the database team" or "contact the security team" without naming specific people, providing contact information, or defining escalation paths. At 3am, a new on-call engineer cannot perform the operation without subject-matter expert help. They spend 20–40 minutes finding who to contact, how to reach them, and whether they're available. The runbook had the procedure but not the people.

How You'll See It

In Incident Command

# Database Failover Runbook (excerpt)
Step 5: If failover doesn't complete in 30 minutes, contact the DBA team for assistance.
At 3am, the on-call engineer does not know who the DBA team is, which Slack channel they use, whether there's an on-call DBA, or who the primary contact is. They search the company directory (if one exists), look through Slack for a DBA channel, and spend 25 minutes finding someone — meanwhile the incident continues.

In Kubernetes

Post-mortem review reveals that 35 minutes of the 47-minute incident was spent finding someone with production database access. The runbook said "contact someone with DB access" but didn't specify who had it or how to reach them after hours.

In CI/CD

CD pipeline fails with a cryptic certificate error. The runbook says "contact the certificates team." There's no certificates team anymore — it was reorganized 8 months ago. The on-call engineer files a ticket (daytime workflow) at 2am because they don't know what else to do.

The Tell

Incident timeline shows 20+ minutes of "finding someone to contact" or "looking up who owns this system." The runbook references a team name, not a person or an on-call mechanism. The contact information is out of date (team restructured, person left).

Common Misdiagnosis

Looks Like But Actually How to Tell the Difference
Slow incident response Runbook missing contacts Timeline shows most time spent on communication, not technical diagnosis
Wrong person paged Runbook didn't specify the right person Contact info exists elsewhere; runbook just doesn't reference it

The Fix (Generic)

  1. Immediate: For the current incident: escalate via PagerDuty/OpsGenie if available; check the company's on-call schedule.
  2. Short-term: Audit all runbooks for "contact [team]" references; replace with specific person names, on-call escalation paths, and direct contact information (phone, Slack handle).
  3. Long-term: Link runbooks to PagerDuty escalation policies; run quarterly "runbook drills" where on-call engineers execute runbooks from cold start and log any gaps; require a "contacts" section at the top of every runbook.

Real-World Examples

  • Example 1: Database failover runbook: "contact DBA." No DBA on-call. On-call engineer found the DBA's personal phone number in an old email. Took 30 minutes. The DBA answered. Failover was 5 minutes. 35 of the 40-minute incident was finding the person.
  • Example 2: TLS certificate expiry runbook: "certificates team will handle." The certificates team was part of a larger security team after a reorg. No one in the security team knew which person handled certificates. Cert renewal took 20 minutes; finding who could do it took 65 minutes.

War Story

Page: "certificate expired on api.example.com." I opened the runbook: "Contact the PKI team for certificate renewal." Searched Slack: no "PKI team" channel. Searched the directory: no "PKI team." Found a "Security" channel. Posted there at 3am: no response for 15 minutes. Found an old Confluence page with a list of "security contacts" — half the people had left the company. Found one who was still there. DM'd them: no response (3am). Found their manager. Manager found the right person. Total time: 68 minutes to find someone. Renewal itself: 4 minutes. Runbook now has: "Certificate issues: page @security-oncall in PagerDuty (policy: sev2-oncall). Primary: @alice. Backup: @bob. Emergency: @carol (CTO for cert emergency)."

Cross-References

  • Topic Packs: incident-command
  • Footguns: incident-command/footguns.md — "Runbook says 'contact team' with no names"
  • Related Patterns: FP-051 (missing escalation criteria — same runbook quality issue), FP-049 (port-forward as fix — incident management failure)