Skip to content

Solution: Lab Runtime 03 -- Observability Target Down

SPOILER WARNING: Try to solve it yourself first. Use hints progressively.


Hint Ladder

Hint 1: Prometheus discovers targets through ServiceMonitor resources. If a target is down, the ServiceMonitor may be misconfigured.

Hint 2: Check the ServiceMonitor's label selector. Compare it to the actual labels on the grokdevops Service.

Hint 3: Run kubectl get servicemonitor -n monitoring -o yaml | grep -A5 selector and kubectl get svc grokdevops -n grokdevops --show-labels. Do the labels match?

Hint 4: The break script modified the ServiceMonitor selector to use non-matching labels. Restore the correct selector or run ./fix.sh.


Minimal Solution

# Identify the mismatch
kubectl get servicemonitor -n monitoring -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.selector.matchLabels}{"\n"}{end}'
kubectl get svc grokdevops -n grokdevops --show-labels

# Fix by restoring correct labels (or run ./fix.sh)
./fix.sh

Explain

Symptom: Prometheus targets page shows the grokdevops target as DOWN or missing entirely.

Evidence: ServiceMonitor's spec.selector.matchLabels doesn't match the Service's labels. Prometheus operator uses these labels to generate scrape configs.

Root cause: Prometheus uses the ServiceMonitor CRD to discover scrape targets. The ServiceMonitor specifies a label selector that must match a Service. When labels don't match, Prometheus never generates a scrape config for that target, so it appears as missing/down.

Key insight: There are two label matching steps: (1) Prometheus operator must find the ServiceMonitor (via serviceMonitorSelector on the Prometheus CR), and (2) the ServiceMonitor must find the Service (via spec.selector.matchLabels). Either can fail.


Prevent

  • Use helm template to verify ServiceMonitor selectors before deploying
  • Add alerting rule for when expected targets disappear
  • Keep ServiceMonitor in the same Helm chart as the app so labels stay in sync

See Also