Skip to content

Observability Architecture

Overview

The GrokDevOps observability stack provides metrics, logging, and tracing using open-source tools deployed on Kubernetes.

Stack Components

Component Role Port
Prometheus Metrics collection and storage 9090
Grafana Dashboards and visualization 3000
Alertmanager Alert routing 9093
Loki Log aggregation 3100
Promtail Log collection (DaemonSet) -
Tempo Distributed tracing 3200

Helm Releases

All components are installed using curated Helm values files from devops/observability/values/.

Release Name Chart Values File
kube-prometheus-stack prometheus-community/kube-prometheus-stack values-prometheus.yaml
loki grafana/loki values-loki.yaml
promtail grafana/promtail values-promtail.yaml
tempo grafana/tempo values-tempo.yaml

Data Flow

Metrics: Application -> Prometheus -> Grafana

FastAPI (/metrics)
    |
    v
ServiceMonitor (selects app by labels)
    |
    v
Prometheus (scrapes every 30s)
    |
    v
Grafana (dashboards, alerts)

The application exposes a /metrics endpoint using prometheus-client. Metrics include:

  • http_requests_total — counter with labels: method, endpoint, status
  • http_request_duration_seconds — histogram with labels: method, endpoint

The Helm chart includes a ServiceMonitor template (gated by monitoring.serviceMonitor.enabled) that tells Prometheus how to scrape the application.

Logging: Application -> Promtail -> Loki -> Grafana

FastAPI (stdout/stderr)
    |
    v
Promtail (DaemonSet, reads container logs)
    |
    v
Loki (stores and indexes by labels)
    |
    v
Grafana (LogQL queries)

Promtail runs as a DaemonSet on every node. It tails container log files from /var/log/pods and ships them to Loki with Kubernetes metadata labels (namespace, pod, container).

Tracing: Application -> Tempo -> Grafana

Application (OTLP exporter, future)
    |
    v
Tempo (trace storage)
    |
    v
Grafana (trace visualization)

Tempo is deployed and ready. Application-side OpenTelemetry instrumentation is a future addition.

Installation

There is a single canonical installer:

# Install the full observability stack
./devops/scripts/install-observability.sh

# Or via Ansible
cd devops/ansible
ansible-playbook playbooks/install-addons.yml

Both paths use the same values files and Helm release names.

Verification

Check all pods are running

kubectl -n monitoring get pods

Check Prometheus targets

kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# Open http://localhost:9090/targets
# Look for serviceMonitor/grokdevops/grokdevops

Query metrics in Grafana

kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Open http://localhost:3000 (admin/admin)
# Explore -> Prometheus -> http_requests_total

Query logs in Grafana

# In Grafana Explore, select Loki data source
# Query: {namespace="grokdevops"}

Uninstall

./devops/scripts/install-observability.sh --uninstall

Values Files Reference

File Chart Purpose
devops/observability/values/values-prometheus.yaml kube-prometheus-stack Prometheus, Alertmanager, Grafana, node-exporter, kube-state-metrics
devops/observability/values/values-loki.yaml grafana/loki Loki log aggregation (single-binary mode)
devops/observability/values/values-promtail.yaml grafana/promtail Promtail log collection DaemonSet
devops/observability/values/values-tempo.yaml grafana/tempo Tempo distributed tracing backend

ServiceMonitor

The preferred approach is the Helm-managed ServiceMonitor. Enable it in your values file:

monitoring:
  serviceMonitor:
    enabled: true

This is already enabled in values-dev.yaml, values-staging.yaml, and values-prod.yaml.

A legacy standalone manifest exists at devops/k8s/monitoring/servicemonitor.legacy.yaml for environments not using the Helm chart.

HPA and Metrics Server

Production values (values-prod.yaml) enable a HorizontalPodAutoscaler. The HPA requires metrics-server to read CPU utilization. Without metrics-server, the HPA will not scale.

Install metrics-server on k3s:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml


Wiki Navigation