Lab 11: Monitoring Stack¶
| Field | Value |
|---|---|
| Tier | 3 — Operations |
| Estimated Time | 60 minutes |
| Prerequisites | k3s cluster, Helm |
| Auto-Grade | Yes |
Scenario¶
Your company just experienced a 4-hour outage that nobody detected until customers complained on social media. The postmortem revealed a critical gap: there is no monitoring, no alerting, and no dashboards. The CTO has given you one week to deploy a monitoring stack. You are starting with the foundation: Prometheus for metrics collection and Grafana for visualization.
You need to deploy Prometheus to scrape metrics from all pods in the cluster, configure alert rules for common failure scenarios (high CPU, pod restarts, node pressure), deploy Grafana with a pre-configured dashboard, and verify the entire pipeline works end-to-end by triggering a test alert.
Objectives¶
- Deploy Prometheus in namespace
lab-monitoringusing a Deployment - Configure Prometheus to scrape all pods with the annotation
prometheus.io/scrape: "true" - Create an alerting rule: fire when any pod has restarted more than 5 times in 10 minutes
- Deploy Grafana with a datasource pointing to Prometheus
- Create a ConfigMap-based dashboard showing pod CPU and memory usage
- Deploy a sample app with metrics endpoint and verify Prometheus scrapes it
- Verify the alert rule appears in Prometheus UI (or API)
Setup¶
Creates namespace lab-monitoring with partial Prometheus config.
Hints¶
Hint 1: Prometheus ConfigMap
Prometheus configuration goes in a ConfigMap mounted at `/etc/prometheus/prometheus.yml`. Key sections: `scrape_configs` with `kubernetes_sd_configs`.Hint 2: Pod annotation-based discovery
Hint 3: Alert rules
Create a separate ConfigMap for rules and reference it in prometheus.yml under `rule_files`. Use `kube_pod_container_status_restarts_total` metric.Hint 4: Grafana datasource
Configure via environment variables or a provisioning ConfigMap at `/etc/grafana/provisioning/datasources/`.Hint 5: Testing the pipeline
Deploy a pod that exposes `/metrics` with Prometheus-format data. Add the `prometheus.io/scrape: "true"` annotation. Check Prometheus targets page.Grading¶
Solution¶
See the solution/ directory for complete manifests.