Start Here¶
This repo is a training system for DevOps and Kubernetes skills. It contains a real application, real infrastructure code, and 300+ learning exercises designed to build operational intuition through deliberate practice.
What's in the box¶
| Component | What it is | Where |
|---|---|---|
| Exercises | 250 break/fix exercises across Bash, Python, Docker, Ansible, K8s | training/interactive/exercises/ |
| Runtime Labs | 8 hands-on labs against a live cluster (break -> investigate -> fix -> verify) | training/interactive/runtime-labs/ |
| Runbooks | 13 incident response runbooks for common K8s failures | training/library/runbooks/ |
| Incident Simulator | 18 injectable failure scenarios with forensics + scoring | training/interactive/incidents/ |
| Investigation Engine | Guided debugging with playbooks, progressive hints, and journaling | training/interactive/investigation/ |
| Interview Scenarios | 11 realistic SRE interview scenarios with expected answers | training/library/interview-scenarios/ |
| Chaos Scripts | 7 safe, reversible chaos experiments | training/interactive/chaos/ |
| Skillchecks | Self-assessment questionnaires by technology | training/library/skillchecks/ |
| Flashcards | Spaced-repetition cards for key concepts | training/interactive/knowledge/data/cards/ |
| Drills | Quick muscle-memory exercises (1-5 min each) | training/library/drills/ |
| Knowledge Compendiums | 2,491 trivia Q&A across Ansible, Linux, Python (shuffled) | Knowledge Compendiums |
| Curriculum | Structured tracks, levels, and learning paths | training/library/curriculum/ |
Mental model¶
+-----------+
| Concepts | <- docs, cards, skillchecks
+-----+-----+
|
+-----v-----+
| Practice | <- exercises, drills
+-----+-----+
|
+-----v-----+
| Apply | <- runtime labs, chaos, incidents
+-----+-----+
|
+-----v-----+
| Reflect | <- runbooks, investigation journal, interview prep
+-----+-----+
- Exercises build foundational skills (syntax, manifests, basic debugging)
- Runtime labs apply those skills against a real running cluster
- Runbooks teach structured thinking about failure patterns
- Incidents + investigation simulate production pressure
- Drills build command-line muscle memory
- Interview scenarios test your ability to articulate reasoning
10-Minute Sanity Warm-Up¶
Confirm everything works before you start training.
1. Deploy the stack (3 min)¶
make deploy-all # Deploys observability stack + grokdevops app
make status # Verify everything is running
You should see:
grokdevops deployment/grokdevops 1/1 Running
monitoring deployment/kube-prometheus-stack-grafana 1/1 Running
2. Run one lab (5 min)¶
# Break: Introduce a readiness probe failure
make lab LAB=lab-runtime-01 MODE=break
# Observe: Check what broke
kubectl get pods -n grokdevops
# You should see: 0/1 Running (not READY)
# Fix: Restore the probe
make lab LAB=lab-runtime-01 MODE=fix
# Verify: Confirm it's fixed
make lab LAB=lab-runtime-01 MODE=verify
# Teardown: Clean up
make lab LAB=lab-runtime-01 MODE=teardown
3. Read one runbook (2 min)¶
Open training/library/runbooks/crashloopbackoff.md and read through the symptoms, triage commands, and fix steps.
You're ready. Pick a path below.
Core Skills¶
Ansible is one of the three most important skills in this training system (Linux > Ansible > Python). Everything you need is in one place: Ansible Hub — topics, lessons, drills, flashcards, case studies, runbooks, and interview prep.
Choose Your Path¶
New to DevOps? The Breaking Into DevOps path is a 16–20 week guided journey from zero to job-ready — covers Linux, networking, containers, IaC, CI/CD, security, and a portfolio capstone.
All paths are in library/portal/paths.md:
| Path | Time commitment | Best for | Start |
|---|---|---|---|
| Daily Driver | 30-45 min/day for 4 weeks | Steady skill building | paths.md#daily-driver |
| Crash Course | 2 weekends | Fast ramp-up | paths.md#crash-course |
| Accelerated DevOps | Self-paced, ~20-30 hours | Experienced engineers, fastest K8s/SRE path | paths.md#accelerated-devops-15-steps |
| Interview Prep | 1-2 weeks | Job preparation | paths.md#interview-prep |
| Breaking Into DevOps | 60-90 min/day for 16-20 weeks | Career changers starting from scratch | paths.md#breaking-into-devops |
| Comprehensive | 60-90 min/day for 40 weeks | All topics + case studies | paths.md#comprehensive |
Browse Content¶
| Browse by... | What it does | Link |
|---|---|---|
| What's New | Recently added and updated content | library/portal/whats_new.md |
| Cross-Domain Lessons | 115 narratives that follow real problems across domains | library/lessons/index.md |
| Content type | Find primers, runbooks, flashcards, scenarios, etc. | library/portal/content_hub.md |
| Topic | Find any subject (DNS, RAID, jq, drain...) | library/portal/topics.md |
| Level | Entry (L0) to Advanced (L3) across all domains | library/portal/levels.md |
| Tag Cloud | Visual map — click an area to see everything in it | library/guides/tag-cloud.md |
| Cross-Domain Incidents | 20 case studies spanning 3 domains each | library/portal/cross_domain.md |
What Do You Want to Do?¶
I want to practice¶
| Activity | Link |
|---|---|
| Break/fix labs against a live cluster | library/labs/README.md |
| Quick command drills (1-5 min each) | library/drills/README.md |
| Flashcards (spaced repetition) | library/portal/interactive/flashcards/ |
| Quiz yourself | library/portal/interactive/quiz/ |
| Incident response under pressure | library/interview-scenarios/ |
| Self-assessment by technology | library/skillchecks/ |
I want to browse and learn¶
| Content | Link |
|---|---|
| Everything by type, domain, or tier | library/portal/content_hub.md |
| Ansible — everything in one place | library/portal/ansible-hub.md |
| Cross-domain lessons (115 narratives) | library/lessons/index.md |
| Postmortems and war stories | library/postmortems/README.md |
| Design patterns | library/patterns/README.md |
I want to follow a plan¶
| Path | Time | Link |
|---|---|---|
| Daily Driver | 30-45 min/day, 4 weeks | paths.md#daily-driver |
| Crash Course | 2 weekends | paths.md#crash-course |
| Accelerated DevOps | ~20-30 hours | paths.md#accelerated-devops-15-steps |
| Interview Prep | 1-2 weeks | paths.md#interview-prep |
| Breaking Into DevOps | 16-20 weeks | paths.md#breaking-into-devops |
| Full Curriculum (all 207 topics) | 40 weeks | paths.md#comprehensive |
How to Train (not just browse)¶
- Do, don't read: Run the break/fix labs. Reading alone doesn't build skill
- Investigate before fixing: Use
make investigateandmake hintbefore looking at solutions - Journal your findings: Use
make explainto record what you learned - Time yourself: Use
make challenge YES=1 MINUTES=10for incident response practice - Review with cards: After a topic, study the relevant flashcards or run
python3 tools/run_training_session.py build --strategy spaced - Repeat failures: If a lab was hard, redo it next week without hints