Quiz: Runbook Craft¶
4 questions
L0 (1 questions)¶
1. What are the five sections every effective runbook should have?
Show answer
1. Trigger — what alert or event activates this runbook.2. Diagnose — commands to run and questions to answer before acting.
3. Act — the fix, using decision trees for multiple scenarios.
4. Verify — confirm the fix worked with specific checks and thresholds.
5. Escalate — when and how to call for help.
L1 (1 questions)¶
1. What are the five runbook automation levels (L0-L4) and which level should be your minimum target?
Show answer
L0: Fully manual prose instructions. L1: Copy-paste commands. L2: Scripts with parameters. L3: Triggered scripts requiring human approval. L4: Fully automated self-healing. Target L1 minimum — copy-paste commands beat prose every time. L1 is achievable immediately and prevents mistyped commands at 3 AM.L2 (1 questions)¶
1. Why are metrics-driven thresholds better than vague descriptions in runbooks? Give an example.
Show answer
Vague: 'Check if latency is high.' The on-call engineer does not know what 'high' means. Metrics-driven: 'Check p99 latency. If > 500ms (normal baseline: 80-120ms), proceed.' This removes ambiguity, sets a concrete trigger, and provides the baseline so the responder knows what healthy looks like. Every runbook check should include: what to check, the threshold, and the normal value.L3 (1 questions)¶
1. How should you test runbooks, and why is having the author test their own runbook insufficient?