Skip to content

Start Here

This repo is a training system for DevOps and Kubernetes skills. It contains a real application, real infrastructure code, and 300+ learning exercises designed to build operational intuition through deliberate practice.

What's in the box

Component What it is Where
Exercises 250 break/fix exercises across Bash, Python, Docker, Ansible, K8s training/interactive/exercises/
Runtime Labs 8 hands-on labs against a live cluster (break -> investigate -> fix -> verify) training/interactive/runtime-labs/
Runbooks 13 incident response runbooks for common K8s failures training/library/runbooks/
Incident Simulator 18 injectable failure scenarios with forensics + scoring training/interactive/incidents/
Investigation Engine Guided debugging with playbooks, progressive hints, and journaling training/interactive/investigation/
Interview Scenarios 11 realistic SRE interview scenarios with expected answers training/library/interview-scenarios/
Chaos Scripts 7 safe, reversible chaos experiments training/interactive/chaos/
Skillchecks Self-assessment questionnaires by technology training/library/skillchecks/
Flashcards Spaced-repetition cards for key concepts training/interactive/knowledge/data/cards/
Drills Quick muscle-memory exercises (1-5 min each) training/library/drills/
Knowledge Compendiums 2,491 trivia Q&A across Ansible, Linux, Python (shuffled) Knowledge Compendiums
Curriculum Structured tracks, levels, and learning paths training/library/curriculum/

Mental model

                  +-----------+
                  | Concepts  |  <- docs, cards, skillchecks
                  +-----+-----+
                        |
                  +-----v-----+
                  |  Practice  |  <- exercises, drills
                  +-----+-----+
                        |
                  +-----v-----+
                  |  Apply     |  <- runtime labs, chaos, incidents
                  +-----+-----+
                        |
                  +-----v-----+
                  | Reflect    |  <- runbooks, investigation journal, interview prep
                  +-----+-----+
  • Exercises build foundational skills (syntax, manifests, basic debugging)
  • Runtime labs apply those skills against a real running cluster
  • Runbooks teach structured thinking about failure patterns
  • Incidents + investigation simulate production pressure
  • Drills build command-line muscle memory
  • Interview scenarios test your ability to articulate reasoning

10-Minute Sanity Warm-Up

Confirm everything works before you start training.

1. Deploy the stack (3 min)

make deploy-all    # Deploys observability stack + grokdevops app
make status        # Verify everything is running

You should see:

grokdevops    deployment/grokdevops   1/1     Running
monitoring    deployment/kube-prometheus-stack-grafana   1/1     Running

2. Run one lab (5 min)

# Break: Introduce a readiness probe failure
make lab LAB=lab-runtime-01 MODE=break

# Observe: Check what broke
kubectl get pods -n grokdevops
# You should see: 0/1 Running (not READY)

# Fix: Restore the probe
make lab LAB=lab-runtime-01 MODE=fix

# Verify: Confirm it's fixed
make lab LAB=lab-runtime-01 MODE=verify

# Teardown: Clean up
make lab LAB=lab-runtime-01 MODE=teardown

3. Read one runbook (2 min)

Open training/library/runbooks/crashloopbackoff.md and read through the symptoms, triage commands, and fix steps.

You're ready. Pick a path below.

Core Skills

Ansible is one of the three most important skills in this training system (Linux > Ansible > Python). Everything you need is in one place: Ansible Hub — topics, lessons, drills, flashcards, case studies, runbooks, and interview prep.

Choose Your Path

New to DevOps? The Breaking Into DevOps path is a 16–20 week guided journey from zero to job-ready — covers Linux, networking, containers, IaC, CI/CD, security, and a portfolio capstone.

All paths are in library/portal/paths.md:

Path Time commitment Best for Start
Daily Driver 30-45 min/day for 4 weeks Steady skill building paths.md#daily-driver
Crash Course 2 weekends Fast ramp-up paths.md#crash-course
Accelerated DevOps Self-paced, ~20-30 hours Experienced engineers, fastest K8s/SRE path paths.md#accelerated-devops-15-steps
Interview Prep 1-2 weeks Job preparation paths.md#interview-prep
Breaking Into DevOps 60-90 min/day for 16-20 weeks Career changers starting from scratch paths.md#breaking-into-devops
Comprehensive 60-90 min/day for 40 weeks All topics + case studies paths.md#comprehensive

Browse Content

Browse by... What it does Link
What's New Recently added and updated content library/portal/whats_new.md
Cross-Domain Lessons 115 narratives that follow real problems across domains library/lessons/index.md
Content type Find primers, runbooks, flashcards, scenarios, etc. library/portal/content_hub.md
Topic Find any subject (DNS, RAID, jq, drain...) library/portal/topics.md
Level Entry (L0) to Advanced (L3) across all domains library/portal/levels.md
Tag Cloud Visual map — click an area to see everything in it library/guides/tag-cloud.md
Cross-Domain Incidents 20 case studies spanning 3 domains each library/portal/cross_domain.md

What Do You Want to Do?

I want to practice

Activity Link
Break/fix labs against a live cluster library/labs/README.md
Quick command drills (1-5 min each) library/drills/README.md
Flashcards (spaced repetition) library/portal/interactive/flashcards/
Quiz yourself library/portal/interactive/quiz/
Incident response under pressure library/interview-scenarios/
Self-assessment by technology library/skillchecks/

I want to browse and learn

Content Link
Everything by type, domain, or tier library/portal/content_hub.md
Ansible — everything in one place library/portal/ansible-hub.md
Cross-domain lessons (115 narratives) library/lessons/index.md
Postmortems and war stories library/postmortems/README.md
Design patterns library/patterns/README.md

I want to follow a plan

Path Time Link
Daily Driver 30-45 min/day, 4 weeks paths.md#daily-driver
Crash Course 2 weekends paths.md#crash-course
Accelerated DevOps ~20-30 hours paths.md#accelerated-devops-15-steps
Interview Prep 1-2 weeks paths.md#interview-prep
Breaking Into DevOps 16-20 weeks paths.md#breaking-into-devops
Full Curriculum (all 207 topics) 40 weeks paths.md#comprehensive

How to Train (not just browse)

  1. Do, don't read: Run the break/fix labs. Reading alone doesn't build skill
  2. Investigate before fixing: Use make investigate and make hint before looking at solutions
  3. Journal your findings: Use make explain to record what you learned
  4. Time yourself: Use make challenge YES=1 MINUTES=10 for incident response practice
  5. Review with cards: After a topic, study the relevant flashcards or run python3 tools/run_training_session.py build --strategy spaced
  6. Repeat failures: If a lab was hard, redo it next week without hints