Skip to content

Portal | Level: L2: Operations | Topics: Prometheus, Loki | Domain: Observability

Observability Drills

Remember: The observability debugging flow: Alert fires -> check Grafana dashboard (what metric is off?) -> check Prometheus (what changed?) -> check Loki logs (why did it change?) -> check Tempo traces (where in the request path?). Each tool answers a different question. Jumping straight to logs without checking metrics first wastes time on red herrings.

Gotcha: If a Prometheus target shows as DOWN, the problem is usually label selector mismatch between the ServiceMonitor and the actual Service. Run kubectl get servicemonitor -o yaml and compare its selector.matchLabels to the Service's metadata.labels — they must match exactly. A single typo keeps the target permanently DOWN with no obvious error message.

15 drills for Prometheus, Loki, Grafana, and Tempo operations. Each takes 1-5 minutes.

Difficulty: [E] Easy (recall) | [I] Intermediate (combine flags/tools) | [H] Hard (multi-step debugging)


Drill 1: Check Prometheus targets [I]

Question: List all Prometheus scrape targets and find which ones are DOWN.

# Your command here
Relevant runbook: training/library/runbooks/prometheus_target_down.md Answer: answers/obs_answers.md


Drill 2: Check if metrics-server is running [E]

Question: Verify that the metrics-server pod is running and the metrics API is available.

# Your command here
Relevant runbook: training/library/runbooks/kubernetes/hpa_not_scaling.md Answer: answers/obs_answers.md


Drill 3: Port-forward to Grafana [E]

Question: Forward local port 3000 to the Grafana service in the monitoring namespace.

# Your command here
Answer: answers/obs_answers.md


Drill 4: Check Promtail pods [E]

Question: List all Promtail pods. Verify they are running on every node.

# Your command here
Relevant runbook: training/library/runbooks/observability/loki_no_logs.md Answer: answers/obs_answers.md


Drill 5: Query Prometheus directly [I]

Question: Port-forward to Prometheus and query the up metric to see which targets are healthy.

# Your command here
Answer: answers/obs_answers.md


Drill 6: Check ServiceMonitor labels [I]

Question: Verify that the grokdevops ServiceMonitor's selector matches the actual service labels.

# Your commands here
Relevant runbook: training/library/runbooks/prometheus_target_down.md Answer: answers/obs_answers.md


Drill 7: Find Loki data source [I]

Question: Check if Loki is configured as a data source in Grafana (via CLI or API).

# Your command here
Answer: answers/obs_answers.md


Drill 8: Check Promtail config [I]

Question: View the Promtail configuration to see which log paths it's scraping.

# Your command here
Answer: answers/obs_answers.md


Drill 9: Check Tempo pods [E]

Question: Verify that Tempo is running and accessible in the monitoring namespace.

# Your command here
Relevant runbook: training/library/runbooks/observability/tempo_no_traces.md Answer: answers/obs_answers.md


Drill 10: Identify why a target is down [H]

Question: A Prometheus target shows as DOWN. Find the ServiceMonitor and compare its selector to the service's labels.

# Your commands here
Relevant lab: training/interactive/runtime-labs/lab-runtime-03-observability-target-down/ Answer: answers/obs_answers.md


Drill 11: Check Prometheus rules [E]

Question: List all PrometheusRule resources in the monitoring namespace.

# Your command here
Answer: answers/obs_answers.md


Drill 12: View Prometheus config [I]

Question: Check the Prometheus configuration to see scrape intervals and rule files.

# Your command here
Answer: answers/obs_answers.md


Drill 13: Check metrics endpoint [E]

Question: Verify that the grokdevops app exposes a /metrics endpoint.

# Your command here
Answer: answers/obs_answers.md


Drill 14: View Grafana dashboards list [I]

Question: List available Grafana dashboards via the API.

# Your command here
Answer: answers/obs_answers.md


Drill 15: Full observability health check [H]

Question: In one sequence, verify that Prometheus, Loki, Promtail, Tempo, and Grafana are all running.

# Your commands here
Answer: answers/obs_answers.md


Wiki Navigation