Solution: Lab Runtime 03 -- Observability Target Down¶

SPOILER WARNING: Try to solve it yourself first. Use hints progressively.

Hint Ladder¶

Hint 1: Prometheus discovers targets through ServiceMonitor resources. If a target is down, the ServiceMonitor may be misconfigured.

Hint 2: Check the ServiceMonitor's label selector. Compare it to the actual labels on the grokdevops Service.

Hint 3: Run kubectl get servicemonitor -n monitoring -o yaml | grep -A5 selector and kubectl get svc grokdevops -n grokdevops --show-labels. Do the labels match?

Hint 4: The break script modified the ServiceMonitor selector to use non-matching labels. Restore the correct selector or run ./fix.sh.

Minimal Solution¶

# Identify the mismatch
kubectl get servicemonitor -n monitoring -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.selector.matchLabels}{"\n"}{end}'
kubectl get svc grokdevops -n grokdevops --show-labels

# Fix by restoring correct labels (or run ./fix.sh)
./fix.sh

Explain¶

Symptom: Prometheus targets page shows the grokdevops target as DOWN or missing entirely.

Evidence: ServiceMonitor's spec.selector.matchLabels doesn't match the Service's labels. Prometheus operator uses these labels to generate scrape configs.

Root cause: Prometheus uses the ServiceMonitor CRD to discover scrape targets. The ServiceMonitor specifies a label selector that must match a Service. When labels don't match, Prometheus never generates a scrape config for that target, so it appears as missing/down.

Key insight: There are two label matching steps: (1) Prometheus operator must find the ServiceMonitor (via serviceMonitorSelector on the Prometheus CR), and (2) the ServiceMonitor must find the Service (via spec.selector.matchLabels). Either can fail.

Prevent¶

Use helm template to verify ServiceMonitor selectors before deploying
Add alerting rule for when expected targets disappear
Keep ServiceMonitor in the same Helm chart as the app so labels stay in sync

Solution: Lab Runtime 03 -- Observability Target Down¶

Hint Ladder¶

Minimal Solution¶

Explain¶

Prevent¶

See Also¶

Pages that link here¶