Skip to content

GrokDevOps Training Hub

New here? Start at training/START_HERE.md -- the single best onboarding doc.

Looking for everything? See the Content Hub -- browse all content by type, domain, or tier.

One place to start learning. This hub indexes all learning content across the repository and adds runtime labs, chaos scripts, runbooks, and interview scenarios.

Surprise Me

Jump to random content — learn something unexpected:

More options: Random Discovery Page · Content Hub

Key Entry Points

Resource Purpose
START_HERE.md 10-minute warm-up + orientation
Content Hub Browse all content by type, domain, or tier
Learning Paths Breaking Into DevOps / Daily Driver / Crash Course / Interview Prep / Comprehensive
Drills Quick muscle-memory exercises
Flashcards 178 decks, 6255 cards — browse online
Quiz Bank 722 self-test questions by topic
Solutions Hint ladders + answer keys for labs

Quickstart

1. Deploy the full stack

make deploy-all    # Deploys observability stack + grokdevops app
make status        # Verify everything is running
make port-forward  # Access app at localhost:8000

2. Use the exercise system (250 exercises across 5 tracks)

source activate.sh                          # From repo root (sets up PATH)
quest list                                  # See all 250 exercises
quest list bash                             # Filter by track
cd training/interactive/exercises/levels/level-01/bash-exit-codes
quest info                                  # See the objective
quest run                                   # Run the broken artifact
quest hint 1                                # Get a nudge if stuck
quest solution                              # See the reference answer

3. Run a runtime lab

make lab-list                              # See available labs
make lab LAB=lab-runtime-01 MODE=break     # Introduce failure
# ... investigate and fix ...
make lab LAB=lab-runtime-01 MODE=verify    # Check your fix
make lab LAB=lab-runtime-01 MODE=teardown  # Clean up

4. Guided investigation loop

make deploy-all                  # Ensure stack is running
make incident YES=1              # Trigger a random incident
make investigate                 # See step-by-step investigation plan
# ... use kubectl/helm/grafana to investigate ...
make hint                        # Get a hint (if stuck)
make hint HINT=2                 # Deeper hint
make explain                     # Record what you found
make incident-resolve            # Clear the incident
make undeploy-all                # (optional) tear down

5. Practice with chaos scripts

make chaos LIST=1                                          # See scripts
training/interactive/chaos/scripts/kill_pods.sh --dry-run              # Preview
training/interactive/chaos/scripts/kill_pods.sh --yes --namespace grokdevops

6. Run incident challenges

make incident                    # Preview a random incident (dry-run)
make incident YES=1              # Inject a random incident
make incident-status             # Check status + elapsed time
make incident-forensics          # Capture evidence bundle
# ... diagnose and fix the issue ...
make incident-resolve            # Mark as resolved, record time

# Challenge mode (time-boxed)
make challenge YES=1 MINUTES=10  # Inject + start timer
make incident-list               # See all 18 scenarios
make scoreboard                  # View your performance history

7. Study runbooks and interview scenarios

make runbook LIST=1    # List all runbooks
ls training/library/runbooks/  # Browse directly
ls training/library/interview-scenarios/

8. Tear everything down

make undeploy-all   # Remove all deployed resources

Directory Structure

training/
├── README.md                          # You are here
├── START_HERE.md                      # 10-minute warm-up + orientation
├── catalog.md                         # Asset registry inventory (maintainer view)
├── kubectl-debugging-cheatsheet.md    # Dense command reference
├── interactive/                       # Hands-on, executable content
   ├── exercises/                     # 250 break/fix exercises across 5 tracks
   ├── runtime-labs/                  # 8 hands-on labs (break -> fix -> verify)
   ├── incidents/                     # Incident simulator (18 scenarios)
   ├── investigation/                 # Guided investigation engine (playbooks, hints)
   ├── chaos/                         # 7 safe, reversible chaos scripts
   ├── assessments/                   # Scorecards and self-assessment
   ├── knowledge/                     # Flashcards and spaced-repetition data
   └── registry/                      # Canonical asset registry
├── library/                           # Reference and study material
   ├── portal/                        # Index of indexes (topics, paths, artifacts)
   ├── runbooks/                      # Incident response runbooks
   ├── interview-scenarios/           # DevOps interview prep scenarios
   ├── drills/                        # Muscle-memory exercises
   ├── skillchecks/                   # Self-assessment skill checks
   ├── cheatsheets/                   # Quick-reference cheat sheets
   ├── topics/                        # Deep-dive topic packs
   ├── scenarios/                     # Multi-step troubleshooting scenarios
   ├── solutions/                     # Hint ladders + answer keys for labs
   ├── curriculum/                    # Structured learning paths
      ├── tracks/                    # 10 skill-based tracks
      ├── levels/                    # 5 progressive levels
      └── coverage/                  # Coverage map and gaps
   ├── domains/                       # Domain-specific content
   ├── guides/                        # CI pipeline, DevOps roadmap guides
   └── reference/                     # Knowledge architecture + lookup material
       └── knowledge-architecture/    # Concept/failure/command intelligence

Prerequisites

  • A running k3s cluster (or any Kubernetes cluster)
  • kubectl, helm installed and configured
  • This repo cloned locally
  • make deploy-all completed successfully for runtime labs