Level 4: Operations & Observability¶

Helm, Prometheus, Loki, Tempo, Grafana, CI/CD, GitOps, Ansible, Terraform.

Concepts¶

helm_chart, helm_upgrade, helm_rollback, helm_values, helm_template, prometheus, servicemonitor, grafana, loki_logging, promtail, tempo, opentelemetry, continuous_integration, continuous_delivery, github_actions, trivy, gitops, argocd, config_drift, ansible, ansible_playbook, terraform, k3s_cluster, cordon_drain

Failure Patterns You Should Be Able to Resolve¶

FP-009: prometheus_target_down
FP-010: loki_no_logs
FP-011: helm_upgrade_failure
FP-015: tempo_no_traces

Commands You Should Be Fluent With¶

helm list / history / status / get values / rollback
helm upgrade / template --debug / lint
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
kubectl get servicemonitor -o yaml
kubectl get pods -n monitoring -l app.kubernetes.io/name=promtail
ansible-playbook -i <inventory> <playbook>
terraform init / plan / apply
argocd app get / sync

Assets to Complete¶

Read (theory)¶

Practice (hands-on)¶

training/interactive/runtime-labs/lab-runtime-05-helm-upgrade-rollback/
training/interactive/runtime-labs/lab-runtime-03-observability-target-down/
training/interactive/runtime-labs/lab-runtime-04-loki-no-logs/
training/interactive/runtime-labs/lab-runtime-07-gitops-sync-and-drift/
Quest Ladder ansible track: levels 1-10 (training/interactive/exercises/levels/level-*/ansible-*)
Study devops/helm/grokdevops/ chart structure

Level 4: Operations & Observability¶

Concepts¶

Failure Patterns You Should Be Able to Resolve¶

Commands You Should Be Fluent With¶

Assets to Complete¶

Read (theory)¶

Practice (hands-on)¶

Runbooks (study)¶

Review (flashcards)¶

Pages that link here¶