Portal | Level: L2: Operations | Topics: Prometheus | Domain: Observability

Scenario: Prometheus Says Target Down¶

The Prompt¶

"Our Grafana dashboards suddenly show 'No data' for application metrics. Prometheus targets page shows our app target is missing entirely. The app is running fine — users can access it. What happened?"

Initial Report¶

Developer Slack message: "All our Grafana dashboards are blank since about 10:30 AM. The app is fine — users can log in — but we have zero visibility into metrics. We're flying blind."

Constraints¶

Time pressure: You have 15 minutes before the next escalation. Without metrics, the team cannot detect further issues.
Limited access: You have read access to the monitoring namespace but cannot restart Prometheus directly. Port-forwarding is available.

Observable Evidence¶

Dashboard: All application panels in Grafana show "No data". Infrastructure panels (node CPU, etc.) still work.
Prometheus /targets: The application scrape target is missing entirely from the targets list.
Logs: Prometheus logs show no errors related to scraping — the target simply is not in its configuration.

Expected Investigation Path¶

# 1. Confirm the app is running
kubectl get pods -n grokdevops
kubectl port-forward svc/grokdevops -n grokdevops 8000:80 &
curl http://localhost:8000/metrics

# 2. Check ServiceMonitor
kubectl get servicemonitor -n grokdevops
kubectl get servicemonitor grokdevops -n grokdevops -o yaml

# 3. Compare ServiceMonitor selector with service labels
kubectl get svc grokdevops -n grokdevops --show-labels

# 4. Check Prometheus config
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# → /targets, /config

Strong Answer¶

"If the app is healthy but Prometheus lost the target, the issue is in the service discovery chain, not the app itself. I'd check three things in order: First, the ServiceMonitor — does it still exist and does its selector.matchLabels match the service's labels? A label change during a Helm upgrade or manual edit could break the match. Second, I'd verify the service has endpoints — if pods aren't ready, the service has no IPs for Prometheus to scrape. Third, I'd check that Prometheus is configured to watch the app's namespace — with serviceMonitorSelectorNilUsesHelmValues=false, it watches all namespaces, but if that changed, it might not see our ServiceMonitor."

Common Traps¶

Assuming Prometheus is broken — the app works fine, it's the scrape config that's wrong
Not understanding ServiceMonitor — it's the bridge between Service and Prometheus
Forgetting about label selectors — the most common cause is a label mismatch
Ignoring the ~60s scrape interval — changes take a scrape cycle to reflect

Practice and Links¶

Lab: training/interactive/runtime-labs/lab-runtime-03-observability-target-down/
Runbook: training/library/runbooks/prometheus_target_down.md
Quest: training/interactive/exercises/levels/level-50/k8s-monitoring/

Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Prometheus
Alerting Rules (Topic Pack, L2) — Prometheus
Alerting Rules Drills (Drill, L2) — Prometheus
Capacity Planning (Topic Pack, L2) — Prometheus
Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention (Case Study, L2) — Prometheus
Case Study: Grafana Dashboard Empty — Prometheus Blocked by NetworkPolicy (Case Study, L2) — Prometheus
Datadog Flashcards (CLI) (flashcard_deck, L1) — Prometheus
Incident Simulator (18 scenarios) (CLI) (Exercise Set, L2) — Prometheus
Lab: Prometheus Target Down (CLI) (Lab, L2) — Prometheus
Monitoring Flashcards (CLI) (flashcard_deck, L1) — Prometheus

Scenario: Prometheus Says Target Down¶

The Prompt¶

Initial Report¶

Constraints¶

Observable Evidence¶

Expected Investigation Path¶

Strong Answer¶

Common Traps¶

Practice and Links¶

Wiki Navigation¶

Pages that link here¶

Scenario: Prometheus Says Target Down¶

The Prompt¶

Initial Report¶

Constraints¶

Observable Evidence¶

Expected Investigation Path¶

Strong Answer¶

Common Traps¶

Practice and Links¶

Wiki Navigation¶

Related Content¶

Pages that link here¶