Portal | Level: L2: Operations | Topics: Prometheus, Grafana, Loki, Tempo | Domain: Observability
Track: Observability¶
Prometheus, Loki, Tempo, Grafana. Metrics, logs, traces.
Goals¶
- Understand the three pillars (metrics, logs, traces)
- Configure and troubleshoot Prometheus scrape targets
- Debug log pipeline (Promtail -> Loki -> Grafana)
- Understand trace propagation (OpenTelemetry -> Tempo)
- Use Grafana for correlation across signals
- Know SLI/SLO concepts and alerting fundamentals
Prerequisites¶
- Concepts: kubernetes, service, deployment, daemonset
make deploy-allcompleted (observability stack running)
Primary Path (12 steps)¶
- Read: training/library/skillchecks/observability.skillcheck.md — three pillars mental model
- Read: devops/docs/observability.md — stack architecture
- Run:
kubectl get pods -n monitoring— verify stack is running - Run:
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80— access Grafana - Study: devops/observability/values/values-prometheus.yaml — Prometheus config
- Study: ServiceMonitor:
kubectl get servicemonitor -n grokdevops -o yaml - Lab: training/interactive/runtime-labs/lab-runtime-03-observability-target-down/ — break/fix Prometheus target
- Read: training/library/runbooks/prometheus_target_down.md — triage procedure
- Lab: training/interactive/runtime-labs/lab-runtime-04-loki-no-logs/ — break/fix log pipeline
- Read: training/library/runbooks/observability/loki_no_logs.md — log pipeline triage
- Read: training/library/runbooks/observability/tempo_no_traces.md — tracing triage
- Study: training/knowledge_architecture/commands/observability_debugging_flow.md — decision tree
Optional Deepening¶
- training/interactive/knowledge/data/cards/prometheusstack.tsv — 244 Prometheus/Grafana/Tempo flashcards
- training/interactive/knowledge/data/cards/observability.tsv — observability flashcards
- training/library/interview-scenarios/03-prometheus-target-down.md — interview prep
- training/library/interview-scenarios/04-loki-logs-disappeared.md — interview prep
Wiki Navigation¶
Prerequisites¶
- Track: Kubernetes Core (Reference, L1)
Next Steps¶
- Track: Incident Response (Reference, L2)
Related Content¶
- Observability Architecture (Reference, L2) — Grafana, Loki, Prometheus
- Observability Deep Dive (Topic Pack, L2) — Grafana, Loki, Prometheus
- Skillcheck: Observability (Assessment, L2) — Grafana, Loki, Prometheus
- Incident Simulator (18 scenarios) (CLI) (Exercise Set, L2) — Loki, Prometheus
- Lab: Prometheus Target Down (CLI) (Lab, L2) — Grafana, Prometheus
- Monitoring Fundamentals (Topic Pack, L1) — Grafana, Prometheus
- Monitoring Migration (Legacy to Modern) (Topic Pack, L2) — Grafana, Prometheus
- Observability Drills (Drill, L2) — Loki, Prometheus
- Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Prometheus
- Alerting Rules (Topic Pack, L2) — Prometheus
Pages that link here¶
- Kubernetes_Core
- Monitoring Fundamentals
- Monitoring Fundamentals - Primer
- Monitoring Migration (Legacy to Modern)
- Observability Architecture
- Observability Debugging Decision Flow
- Observability Drills
- Observability Skillcheck
- Primer
- Primer
- Runbook: Grafana Dashboard Blank / No Data
- Runbook: Loki Not Receiving Logs
- Runbook: Prometheus Target Down
- Runbook: Tempo Not Receiving Traces
- Scenario: Logs Disappeared from Grafana Loki