Pattern: Missing `absent()` Alert¶

ID: FP-042 Family: Observability Gap Frequency: Common Blast Radius: Single Service (monitoring blind spot) Detection Difficulty: Actively Misleading

The Shape¶

Most Prometheus alerts are threshold-based: "alert if metric > N." These alerts require the metric to exist. If the service crashes and stops emitting metrics, the metric disappears from Prometheus entirely. Threshold-based alerts return "no data" (not "firing"). The service is completely down; all alerts are silent. Without absent() checks that detect metric disappearance, a complete service failure can go undetected until users report it.

How You'll See It¶

In Kubernetes¶

Service crashes (OOMKill, image pull error, node failure). Prometheus scrapes the pod's /metrics endpoint: connection refused. Prometheus records no data points for that target (as opposed to a "0" value). Alert rule error_rate > 0.01 evaluates to "no data" because there is no error rate to evaluate. Alert doesn't fire. Service is down; monitoring shows nothing.

# Alert that doesn't fire when metrics disappear:
- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.01

# Alert that fires when metrics disappear:
- alert: ServiceDown
  expr: absent(up{job="myapp"} == 1)
  for: 2m

In Linux/Infrastructure¶

A Prometheus scrape target's up metric goes to 0 (scrape failed). No alerts check up == 0. Prometheus's own up metric change is visible in dashboards but not in alert rules. The service goes down; dashboards are red; pages are silent.

In CI/CD¶

A metrics exporter in a CI step crashes. No metrics are emitted. CI pipeline dashboard (Grafana) shows blank panels. Engineers interpret blank panels as "no data to display" (a Grafana display option) rather than "no metrics available." The exporter failure is invisible.

The Tell¶

Service is down. Grafana dashboards show blank panels (no data). Prometheus up{job="myapp"} is 0 or the metric is absent. No alerts have fired despite the service being completely unavailable. Absence of metrics is the signal; threshold-based alerts cannot detect absence.

Common Misdiagnosis¶

Looks Like	But Actually	How to Tell the Difference
Alert system working correctly	Absent metric not monitored	Check Prometheus `up` metric; check if alerts fire when the target is removed
Service degraded but not down	Service completely down; metrics absent	No data is different from low-data; blank Grafana panel vs zero-valued panel
Prometheus issue	Target issue	Other targets scraping fine; only this target's `up` is 0

The Fix (Generic)¶

Immediate: Add an alert on absent(up{job="myapp"}) or up{job="myapp"} == 0 for all critical services.
Short-term: For every critical metric, add a companion absent() alert; set for: 2m to avoid alert flap on brief scrape failures.
Long-term: Establish a monitoring coverage checklist: every service must have at minimum an up alert, a latency alert, and an error-rate alert — with absent() guarding the scrape.

Real-World Examples¶

Example 1: Payment service OOMKilled at 2am. Metrics disappeared. All threshold alerts: "no data." Service was down for 47 minutes before a user called support. absent() alert would have fired within 2 minutes.
Example 2: Prometheus scrape configuration had a typo after a refactor: job name changed from myapp to my-app. All existing alerts used job="myapp". Metrics still emitted but no alerts could fire (label mismatch). Discovered during an outage that should have paged.

War Story¶

We had 15 carefully crafted Prometheus alerts. Error rate alert. Latency alert. Saturation alert. The whole SRE book. Then our service died at 3am. We woke up to 200 user emails. Zero alerts had fired. The pod had OOMKilled; metrics were gone; all our alerts evaluated to "no data." We'd spent weeks tuning thresholds and zero time on "is the service even emitting metrics?" Added absent() alerts for all critical jobs that same morning. Within a month, we'd caught 3 more undetected outages via absent alerts.

Cross-References¶

Topic Packs: observability-deep-dive, alerting-rules
Footguns: observability-deep-dive/footguns.md — "Missing absent() alert"
Related Patterns: FP-041 (alerting on restart — complementary alert quality issue), FP-024 (health check lying — health check passes but service broken)

Pattern: Missing absent() Alert¶