observability
l2
cheat-sheet
prometheus
grafana
loki
tempo --- Portal | Level: L2: Operations | Topics: Prometheus, Grafana, Loki, Tempo | Domain: Observability

Observability Architecture¶

Overview¶

The GrokDevOps observability stack provides metrics, logging, and tracing using open-source tools deployed on Kubernetes.

Stack Components¶

Component	Role	Port
Prometheus	Metrics collection and storage	9090
Grafana	Dashboards and visualization	3000
Alertmanager	Alert routing	9093
Loki	Log aggregation	3100
Promtail	Log collection (DaemonSet)	-
Tempo	Distributed tracing	3200

Helm Releases¶

All components are installed using curated Helm values files from devops/observability/values/.

Release Name	Chart	Values File
`kube-prometheus-stack`	`prometheus-community/kube-prometheus-stack`	`values-prometheus.yaml`
`loki`	`grafana/loki`	`values-loki.yaml`
`promtail`	`grafana/promtail`	`values-promtail.yaml`
`tempo`	`grafana/tempo`	`values-tempo.yaml`

Data Flow¶

Metrics: Application -> Prometheus -> Grafana¶

FastAPI (/metrics)
    |
    v
ServiceMonitor (selects app by labels)
    |
    v
Prometheus (scrapes every 30s)
    |
    v
Grafana (dashboards, alerts)

The application exposes a /metrics endpoint using prometheus-client. Metrics include:

http_requests_total — counter with labels: method, endpoint, status
http_request_duration_seconds — histogram with labels: method, endpoint

The Helm chart includes a ServiceMonitor template (gated by monitoring.serviceMonitor.enabled) that tells Prometheus how to scrape the application.

Logging: Application -> Promtail -> Loki -> Grafana¶

FastAPI (stdout/stderr)
    |
    v
Promtail (DaemonSet, reads container logs)
    |
    v
Loki (stores and indexes by labels)
    |
    v
Grafana (LogQL queries)

Promtail runs as a DaemonSet on every node. It tails container log files from /var/log/pods and ships them to Loki with Kubernetes metadata labels (namespace, pod, container).

Tracing: Application -> Tempo -> Grafana¶

Application (OTLP exporter, future)
    |
    v
Tempo (trace storage)
    |
    v
Grafana (trace visualization)

Tempo is deployed and ready. Application-side OpenTelemetry instrumentation is a future addition.

Installation¶

There is a single canonical installer:

# Install the full observability stack
./devops/scripts/install-observability.sh

# Or via Ansible
cd devops/ansible
ansible-playbook playbooks/install-addons.yml

Both paths use the same values files and Helm release names.

Verification¶

Check all pods are running¶

kubectl -n monitoring get pods

Check Prometheus targets¶

kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# Open http://localhost:9090/targets
# Look for serviceMonitor/grokdevops/grokdevops

Query metrics in Grafana¶

kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Open http://localhost:3000 (admin/admin)
# Explore -> Prometheus -> http_requests_total

Query logs in Grafana¶

# In Grafana Explore, select Loki data source
# Query: {namespace="grokdevops"}

Uninstall¶

./devops/scripts/install-observability.sh --uninstall

Values Files Reference¶

File	Chart	Purpose
`devops/observability/values/values-prometheus.yaml`	kube-prometheus-stack	Prometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics
`devops/observability/values/values-loki.yaml`	grafana/loki	Loki log aggregation (single-binary mode)
`devops/observability/values/values-promtail.yaml`	grafana/promtail	Promtail log collection DaemonSet
`devops/observability/values/values-tempo.yaml`	grafana/tempo	Tempo distributed tracing backend

ServiceMonitor¶

The preferred approach is the Helm-managed ServiceMonitor. Enable it in your values file:

monitoring:
  serviceMonitor:
    enabled: true

This is already enabled in values-dev.yaml, values-staging.yaml, and values-prod.yaml.

A legacy standalone manifest exists at devops/k8s/monitoring/servicemonitor.legacy.yaml for environments not using the Helm chart.

HPA and Metrics Server¶

Production values (values-prod.yaml) enable a HorizontalPodAutoscaler. The HPA requires metrics-server to read CPU utilization. Without metrics-server, the HPA will not scale.

Install metrics-server on k3s:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Observability Deep Dive (Topic Pack, L2) — Grafana, Loki, Prometheus
Skillcheck: Observability (Assessment, L2) — Grafana, Loki, Prometheus
Track: Observability (Reference, L2) — Grafana, Loki, Prometheus
Incident Simulator (18 scenarios) (CLI) (Exercise Set, L2) — Loki, Prometheus
Lab: Prometheus Target Down (CLI) (Lab, L2) — Grafana, Prometheus
Monitoring Fundamentals (Topic Pack, L1) — Grafana, Prometheus
Monitoring Migration (Legacy to Modern) (Topic Pack, L2) — Grafana, Prometheus
Observability Drills (Drill, L2) — Loki, Prometheus
Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Prometheus
Alerting Rules (Topic Pack, L2) — Prometheus

Observability Architecture¶

Overview¶

Stack Components¶

Helm Releases¶

Data Flow¶

Metrics: Application -> Prometheus -> Grafana¶

Logging: Application -> Promtail -> Loki -> Grafana¶

Tracing: Application -> Tempo -> Grafana¶

Installation¶

Verification¶

Check all pods are running¶

Check Prometheus targets¶

Query metrics in Grafana¶

Query logs in Grafana¶

Uninstall¶

Values Files Reference¶

ServiceMonitor¶

HPA and Metrics Server¶

Wiki Navigation¶

Pages that link here¶

Observability Architecture¶

Overview¶

Stack Components¶

Helm Releases¶

Data Flow¶

Metrics: Application -> Prometheus -> Grafana¶

Logging: Application -> Promtail -> Loki -> Grafana¶

Tracing: Application -> Tempo -> Grafana¶

Installation¶

Verification¶

Check all pods are running¶

Check Prometheus targets¶

Query metrics in Grafana¶

Query logs in Grafana¶

Uninstall¶

Values Files Reference¶

ServiceMonitor¶

HPA and Metrics Server¶

Wiki Navigation¶

Related Content¶

Pages that link here¶