Level 4: Operations & Observability¶
Helm, Prometheus, Loki, Tempo, Grafana, CI/CD, GitOps, Ansible, Terraform.
Concepts¶
helm_chart, helm_upgrade, helm_rollback, helm_values, helm_template, prometheus, servicemonitor, grafana, loki_logging, promtail, tempo, opentelemetry, continuous_integration, continuous_delivery, github_actions, trivy, gitops, argocd, config_drift, ansible, ansible_playbook, terraform, k3s_cluster, cordon_drain
Failure Patterns You Should Be Able to Resolve¶
- FP-009: prometheus_target_down
- FP-010: loki_no_logs
- FP-011: helm_upgrade_failure
- FP-015: tempo_no_traces
Commands You Should Be Fluent With¶
helm list/history/status/get values/rollbackhelm upgrade/template --debug/lintkubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80kubectl get servicemonitor -o yamlkubectl get pods -n monitoring -l app.kubernetes.io/name=promtailansible-playbook -i <inventory> <playbook>terraform init/plan/applyargocd app get/sync
Assets to Complete¶
Read (theory)¶
- devops/docs/observability.md
- training/library/skillchecks/observability.skillcheck.md
- devops/docs/ci-pipeline.md
- devops/docs/gitops-example.md
- training/library/skillchecks/cicd.skillcheck.md
- training/library/skillchecks/terraform.iac.md
- devops/docs/ansible-cluster-management.md
Practice (hands-on)¶
- training/interactive/runtime-labs/lab-runtime-05-helm-upgrade-rollback/
- training/interactive/runtime-labs/lab-runtime-03-observability-target-down/
- training/interactive/runtime-labs/lab-runtime-04-loki-no-logs/
- training/interactive/runtime-labs/lab-runtime-07-gitops-sync-and-drift/
- Quest Ladder ansible track: levels 1-10 (
training/interactive/exercises/levels/level-*/ansible-*) - Study devops/helm/grokdevops/ chart structure
Runbooks (study)¶
- training/library/runbooks/cicd/helm_upgrade_failed.md
- training/library/runbooks/prometheus_target_down.md
- training/library/runbooks/observability/loki_no_logs.md
- training/library/runbooks/observability/tempo_no_traces.md
Review (flashcards)¶
- training/interactive/knowledge/data/cards/prometheusstack.tsv
- training/interactive/knowledge/data/cards/cicd.tsv
- training/interactive/knowledge/data/cards/terraform.tsv
- training/interactive/knowledge/data/cards/ansible.tsv
Pages that link here¶
- CI Pipeline
- CI/CD - Skill Check
- GitOps Deployment with ArgoCD
- Observability Architecture
- Observability Skillcheck
- Runbook: Helm Upgrade Failed
- Runbook: Loki Not Receiving Logs
- Runbook: Prometheus Target Down
- Runbook: Tempo Not Receiving Traces
- Terraform / Infrastructure as Code - Skill Check
- Training Curriculum