Skip to content

Level 4: Operations & Observability

Helm, Prometheus, Loki, Tempo, Grafana, CI/CD, GitOps, Ansible, Terraform.

Concepts

helm_chart, helm_upgrade, helm_rollback, helm_values, helm_template, prometheus, servicemonitor, grafana, loki_logging, promtail, tempo, opentelemetry, continuous_integration, continuous_delivery, github_actions, trivy, gitops, argocd, config_drift, ansible, ansible_playbook, terraform, k3s_cluster, cordon_drain

Failure Patterns You Should Be Able to Resolve

  • FP-009: prometheus_target_down
  • FP-010: loki_no_logs
  • FP-011: helm_upgrade_failure
  • FP-015: tempo_no_traces

Commands You Should Be Fluent With

  • helm list / history / status / get values / rollback
  • helm upgrade / template --debug / lint
  • kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
  • kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
  • kubectl get servicemonitor -o yaml
  • kubectl get pods -n monitoring -l app.kubernetes.io/name=promtail
  • ansible-playbook -i <inventory> <playbook>
  • terraform init / plan / apply
  • argocd app get / sync

Assets to Complete

Read (theory)

Practice (hands-on)

Runbooks (study)

Review (flashcards)