Portal | Level: L2: Operations | Topics: Prometheus, Loki | Domain: Observability

Observability Drills¶

Remember: The observability debugging flow: Alert fires -> check Grafana dashboard (what metric is off?) -> check Prometheus (what changed?) -> check Loki logs (why did it change?) -> check Tempo traces (where in the request path?). Each tool answers a different question. Jumping straight to logs without checking metrics first wastes time on red herrings.

Gotcha: If a Prometheus target shows as DOWN, the problem is usually label selector mismatch between the ServiceMonitor and the actual Service. Run kubectl get servicemonitor -o yaml and compare its selector.matchLabels to the Service's metadata.labels — they must match exactly. A single typo keeps the target permanently DOWN with no obvious error message.

15 drills for Prometheus, Loki, Grafana, and Tempo operations. Each takes 1-5 minutes.

Difficulty: [E] Easy (recall) | [I] Intermediate (combine flags/tools) | [H] Hard (multi-step debugging)

Drill 1: Check Prometheus targets [I]¶

Question: List all Prometheus scrape targets and find which ones are DOWN.

# Your command here

Relevant runbook: training/library/runbooks/prometheus_target_down.md Answer: answers/obs_answers.md

Drill 2: Check if metrics-server is running [E]¶

Question: Verify that the metrics-server pod is running and the metrics API is available.

# Your command here

Relevant runbook: training/library/runbooks/kubernetes/hpa_not_scaling.md Answer: answers/obs_answers.md

Drill 3: Port-forward to Grafana [E]¶

Question: Forward local port 3000 to the Grafana service in the monitoring namespace.

# Your command here

Answer: answers/obs_answers.md

Drill 4: Check Promtail pods [E]¶

Question: List all Promtail pods. Verify they are running on every node.

# Your command here

Relevant runbook: training/library/runbooks/observability/loki_no_logs.md Answer: answers/obs_answers.md

Drill 5: Query Prometheus directly [I]¶

Question: Port-forward to Prometheus and query the up metric to see which targets are healthy.

# Your command here

Answer: answers/obs_answers.md

Drill 6: Check ServiceMonitor labels [I]¶

Question: Verify that the grokdevops ServiceMonitor's selector matches the actual service labels.

# Your commands here

Relevant runbook: training/library/runbooks/prometheus_target_down.md Answer: answers/obs_answers.md

Drill 7: Find Loki data source [I]¶

Question: Check if Loki is configured as a data source in Grafana (via CLI or API).

# Your command here

Answer: answers/obs_answers.md

Drill 8: Check Promtail config [I]¶

Question: View the Promtail configuration to see which log paths it's scraping.

# Your command here

Answer: answers/obs_answers.md

Drill 9: Check Tempo pods [E]¶

Question: Verify that Tempo is running and accessible in the monitoring namespace.

# Your command here

Relevant runbook: training/library/runbooks/observability/tempo_no_traces.md Answer: answers/obs_answers.md

Drill 10: Identify why a target is down [H]¶

Question: A Prometheus target shows as DOWN. Find the ServiceMonitor and compare its selector to the service's labels.

# Your commands here

Relevant lab: training/interactive/runtime-labs/lab-runtime-03-observability-target-down/ Answer: answers/obs_answers.md

Drill 11: Check Prometheus rules [E]¶

Question: List all PrometheusRule resources in the monitoring namespace.

# Your command here

Answer: answers/obs_answers.md

Drill 12: View Prometheus config [I]¶

Question: Check the Prometheus configuration to see scrape intervals and rule files.

# Your command here

Answer: answers/obs_answers.md

Drill 13: Check metrics endpoint [E]¶

Question: Verify that the grokdevops app exposes a /metrics endpoint.

# Your command here

Answer: answers/obs_answers.md

Drill 14: View Grafana dashboards list [I]¶

Question: List available Grafana dashboards via the API.

# Your command here

Answer: answers/obs_answers.md

Drill 15: Full observability health check [H]¶

Question: In one sequence, verify that Prometheus, Loki, Promtail, Tempo, and Grafana are all running.

# Your commands here

Answer: answers/obs_answers.md

Incident Simulator (18 scenarios) (CLI) (Exercise Set, L2) — Loki, Prometheus
Observability Architecture (Reference, L2) — Loki, Prometheus
Observability Deep Dive (Topic Pack, L2) — Loki, Prometheus
Skillcheck: Observability (Assessment, L2) — Loki, Prometheus
Track: Observability (Reference, L2) — Loki, Prometheus
Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Prometheus
Alerting Rules (Topic Pack, L2) — Prometheus
Alerting Rules Drills (Drill, L2) — Prometheus
Capacity Planning (Topic Pack, L2) — Prometheus
Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention (Case Study, L2) — Prometheus

Observability Drills¶

Drill 1: Check Prometheus targets [I]¶

Drill 2: Check if metrics-server is running [E]¶

Drill 3: Port-forward to Grafana [E]¶

Drill 4: Check Promtail pods [E]¶

Drill 5: Query Prometheus directly [I]¶

Drill 6: Check ServiceMonitor labels [I]¶

Drill 7: Find Loki data source [I]¶

Drill 8: Check Promtail config [I]¶

Drill 9: Check Tempo pods [E]¶

Drill 10: Identify why a target is down [H]¶

Drill 11: Check Prometheus rules [E]¶

Drill 12: View Prometheus config [I]¶

Drill 13: Check metrics endpoint [E]¶

Drill 14: View Grafana dashboards list [I]¶

Drill 15: Full observability health check [H]¶

Wiki Navigation¶

Pages that link here¶

Observability Drills¶

Drill 1: Check Prometheus targets [I]¶

Drill 2: Check if metrics-server is running [E]¶

Drill 3: Port-forward to Grafana [E]¶

Drill 4: Check Promtail pods [E]¶

Drill 5: Query Prometheus directly [I]¶

Drill 6: Check ServiceMonitor labels [I]¶

Drill 7: Find Loki data source [I]¶

Drill 8: Check Promtail config [I]¶

Drill 9: Check Tempo pods [E]¶

Drill 10: Identify why a target is down [H]¶

Drill 11: Check Prometheus rules [E]¶

Drill 12: View Prometheus config [I]¶

Drill 13: Check metrics endpoint [E]¶

Drill 14: View Grafana dashboards list [I]¶

Drill 15: Full observability health check [H]¶

Wiki Navigation¶

Related Content¶

Pages that link here¶