Quiz: Alerting Rules¶

3 questions

L1 (1 questions)¶

1. How does Alertmanager route and group alerts?

Show answer

route: defines a tree of matchers. Each alert matches the first route whose labels match. group_by: aggregates alerts with the same label values into one notification (e.g., group_by: [alertname, cluster]). group_wait: initial delay before sending (to batch). group_interval: wait before sending updates. repeat_interval: how often to re-notify. Receivers define where alerts go (Slack, PagerDuty, email). inhibit_rules suppress alerts when a broader alert is firing.

L2 (2 questions)¶

1. What makes a good alert? Give criteria.

Show answer

1. Actionable (someone needs to do something).
2. Urgent (can't wait until morning).
3. Real (low false-positive rate).
4. Includes runbook link. Symptoms over causes — alert on error rate, not CPU.

2. What is the RED method and how do you implement it?

Show answer

RED = Rate (requests/sec), Errors (failed requests/sec), Duration (latency distribution). Applicable to every request-driven service. Implementation: instrument with a histogram for duration and counters for total/error requests. PromQL: Rate = rate(http_requests_total[5m]), Errors = rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]), Duration = histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])). Dashboard one RED panel per service.