Production Readiness Review: Study Plans¶
These study plans are generated based on your assessment results and scoring. Each plan cross-references real topic packs, case studies, and drills from the grokdevops wiki.
By Weak Section¶
If Kubernetes Operations < 20¶
You need: Practical K8s troubleshooting under pressure. You may understand the concepts but lack muscle memory for the specific commands and diagnostic flows.
2-week plan:
Week 1 — Core Operations and Debugging
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: K8s Operations | k8s-ops/primer.md |
| Tue | Primer: Pods and Scheduling | k8s-pods-and-scheduling/primer.md |
| Wed | Street Ops: K8s Debugging Playbook | k8s-debugging-playbook/street_ops.md |
| Thu | Case Study: CrashLoopBackOff | case-studies/kubernetes_ops/crashloopbackoff-no-logs/ |
| Fri | Case Study: Resource Quota Blocking | case-studies/kubernetes_ops/resource-quota-blocking-deploy/ |
| Sat | Primer: K8s Storage | k8s-storage/primer.md |
| Sun | Case Study: PV Stuck Terminating | case-studies/kubernetes_ops/persistent-volume-stuck-terminating/ |
Week 2 — Networking, Scaling, and Node Operations
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: K8s Networking | k8s-networking/primer.md |
| Tue | Primer: K8s Services and Ingress | k8s-services-and-ingress/primer.md |
| Wed | Street Ops: K8s HPA | k8s-ops (HPA)/street_ops.md |
| Thu | Case Study: CNI Broken After Restart | case-studies/kubernetes_ops/cni-broken-after-restart/ |
| Fri | Case Study: Drain Blocked by PDB | case-studies/kubernetes_ops/drain-blocked-by-pdb/ |
| Sat | Primer: Node Lifecycle | k8s-node-lifecycle/primer.md |
| Sun | Case Study: Node Pressure Evictions | case-studies/kubernetes_ops/node-pressure-evictions/ |
Daily drill (15 min): python3 tools/run_training_session.py build --strategy spaced --topic k8s --count 10
Capstone: Complete all 10 Kubernetes case studies, then re-score Section 1.
If Observability < 16¶
You need: Comfort with PromQL, Loki queries, and tracing. You need to be able to investigate an incident using the observability stack without hesitation.
2-week plan:
Week 1 — Metrics and Alerting
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: Prometheus | prometheus-deep-dive/primer.md |
| Tue | Street Ops: Prometheus | prometheus-deep-dive/street_ops.md |
| Wed | Primer: Alerting Rules | alerting-rules/primer.md |
| Thu | Primer: Monitoring Fundamentals | monitoring-fundamentals/primer.md |
| Fri | Footguns: Prometheus | prometheus-deep-dive/footguns.md |
| Sat | Primer: SLO Tooling | slo-tooling/primer.md |
| Sun | Primer: Postmortem and SLO | postmortem-slo/primer.md |
Week 2 — Logs, Traces, and Integration
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: Logging | logging/primer.md |
| Tue | Primer: Log Pipelines | log-pipelines/primer.md |
| Wed | Primer: Tracing | tracing/primer.md |
| Thu | Primer: OpenTelemetry | opentelemetry/primer.md |
| Fri | Primer: Observability Deep Dive | observability-deep-dive/primer.md |
| Sat | Case Study: Grafana Empty / NetworkPolicy | case-studies/cross-domain/grafana-empty-prometheus-networkpolicy/ |
| Sun | Case Study: Disk Full / Runaway Logs / Loki | case-studies/cross-domain/disk-full-runaway-logs-loki/ |
Daily drill (10 min): python3 tools/run_training_session.py build --strategy spaced --topic observability --count 8
Capstone: Write 5 PromQL queries from memory: error rate, latency percentile, saturation, SLO burn rate, and cardinality count.
If Networking < 14¶
You need: Practical network debugging skills for a Kubernetes + cloud environment. DNS, TLS, MTU, and ingress troubleshooting are the priority.
2-week plan:
Week 1 — Fundamentals
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: DNS Deep Dive | dns-deep-dive/primer.md |
| Tue | Primer: TLS | tls/primer.md |
| Wed | Primer: TLS Certificates Ops | tls-certificates-ops/primer.md |
| Thu | Street Ops: Networking Troubleshooting | networking-troubleshooting/street_ops.md |
| Fri | Primer: MTU | mtu/primer.md |
| Sat | Case Study: MTU Blackhole TLS Stalls | case-studies/networking/mtu-blackhole-tls-stalls/ |
| Sun | Case Study: DNS Resolution Slow | case-studies/networking/dns-resolution-slow/ |
Week 2 — Kubernetes Networking and Ingress
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: K8s Networking | k8s-networking/primer.md |
| Tue | Primer: K8s Services and Ingress | k8s-services-and-ingress/primer.md |
| Wed | Primer: NGINX Web Servers | nginx-web-servers/primer.md |
| Thu | Primer: Load Balancing | load-balancing/primer.md |
| Fri | Case Study: SSL Cert Chain Incomplete | case-studies/networking/ssl-cert-chain-incomplete/ |
| Sat | Case Study: DNS TLS cert-manager | case-studies/cross-domain/dns-tls-certmanager/ |
| Sun | Case Study: CoredDNS Timeout Pod DNS | case-studies/kubernetes_ops/coredns-timeout-pod-dns/ |
Daily drill (10 min): python3 tools/run_training_session.py build --strategy spaced --topic networking --count 8
Capstone: Trace a full request path from client to pod using tcpdump, curl -v, and Ingress-NGINX logs.
If Linux & Infrastructure < 14¶
You need: Confidence in Linux system debugging and infrastructure-as-code. You must be able to diagnose disk, memory, and process issues on nodes, and understand Terraform and Ansible well enough to troubleshoot failures.
2-week plan:
Week 1 — Linux Operations
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: Linux Ops | linux-ops/primer.md |
| Tue | Primer: Linux Performance | linux-performance/primer.md |
| Wed | Street Ops: Linux Performance | linux-performance/street_ops.md |
| Thu | Primer: Linux Memory Management | linux-memory-management/primer.md |
| Fri | Case Study: OOM Killer Events | case-studies/linux_ops/oom-killer-events/ |
| Sat | Primer: Disk and Storage Ops | disk-and-storage-ops/primer.md |
| Sun | Case Study: Runaway Logs Fill Disk | case-studies/linux_ops/runaway-logs-fill-disk/ |
Week 2 — Infrastructure as Code
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: Terraform | terraform/primer.md |
| Tue | Street Ops: Terraform Deep Dive | terraform-deep-dive/street_ops.md |
| Wed | Primer: Ansible | ansible/primer.md |
| Thu | Street Ops: Ansible Deep Dive | ansible-deep-dive/street_ops.md |
| Fri | Case Study: Time Sync Skew Breaks App | case-studies/linux_ops/time-sync-skew-breaks-app/ |
| Sat | Case Study: Terraform State Lock DynamoDB | case-studies/cross-domain/terraform-state-lock-dynamodb/ |
| Sun | Case Study: Node NotReady NIC Firmware Ansible | case-studies/cross-domain/node-notready-nic-firmware-ansible/ |
Daily drill (10 min): python3 tools/run_training_session.py build --strategy spaced --topic linux --count 8
Capstone: On a test node, diagnose a synthetic disk/memory/process issue using only top, iostat, vmstat, df, du, lsof, dmesg, and journalctl.
If Security < 12¶
You need: Understanding of the security stack (Vault, cert-manager, OPA) and incident response procedures for secret leaks and misconfigurations.
2-week plan:
Week 1 — Secrets and Certificate Management
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: HashiCorp Vault | hashicorp-vault/primer.md |
| Tue | Street Ops: Vault | hashicorp-vault/street_ops.md |
| Wed | Primer: Secrets Management | secrets-management/primer.md |
| Thu | Primer: cert-manager | cert-manager/primer.md |
| Fri | Primer: TLS PKI | tls-pki/primer.md |
| Sat | Case Study: Deployment Stuck ImagePull Vault | case-studies/cross-domain/deployment-stuck-imagepull-vault/ |
| Sun | Case Study: DNS TLS cert-manager | case-studies/cross-domain/dns-tls-certmanager/ |
Week 2 — Policy, Hardening, and Incident Response
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: Policy Engines (OPA) | policy-engines/primer.md |
| Tue | Primer: Security Basics | security-basics/primer.md |
| Wed | Primer: Linux Hardening | linux-hardening/primer.md |
| Thu | Primer: K8s RBAC | k8s-rbac/primer.md |
| Fri | Primer: Container Image Scanning | container-images/primer.md |
| Sat | Case Study: Container Vuln Scanner False Positive | case-studies/cross-domain/container-vuln-scanner-false-positive/ |
| Sun | Primer: Supply Chain Security | supply-chain-security/primer.md |
Daily drill (10 min): python3 tools/run_training_session.py build --strategy spaced --topic security --count 8
Capstone: Walk through a complete secret leak response: revoke, rotate, audit, and harden.
If CI/CD & DevOps Tooling < 12¶
You need: Fluency with the GitHub Actions + ArgoCD + Helm pipeline. You must be able to debug build failures, resolve GitOps drift, and execute safe rollbacks.
2-week plan:
Week 1 — CI/CD Fundamentals
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: GitHub Actions | github-actions/primer.md |
| Tue | Street Ops: GitHub Actions | github-actions/street_ops.md |
| Wed | Primer: CI/CD Patterns | ci-cd-patterns/primer.md |
| Thu | Primer: Docker | docker/primer.md |
| Fri | Primer: Container Image Optimization | container-images/primer.md |
| Sat | Case Study: CI Pipeline Docker Cache Registry | case-studies/cross-domain/ci-pipeline-docker-cache-registry/ |
| Sun | Footguns: GitHub Actions | github-actions/footguns.md |
Week 2 — GitOps, Helm, and Deployment Strategies
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: ArgoCD GitOps | argocd-gitops/primer.md |
| Tue | Street Ops: ArgoCD GitOps | argocd-gitops/street_ops.md |
| Wed | Primer: Helm | helm/primer.md |
| Thu | Street Ops: Helm | helm/street_ops.md |
| Fri | Primer: Progressive Delivery | progressive-delivery/primer.md |
| Sat | Case Study: Canary Deploy Wrong Backend Ingress | case-studies/cross-domain/canary-deploy-wrong-backend-ingress/ |
| Sun | Case Study: Pod OOMKilled Sidecar Helm | case-studies/cross-domain/pod-oomkilled-sidecar-helm/ |
Daily drill (10 min): python3 tools/run_training_session.py build --strategy spaced --topic cicd --count 8
Capstone: Deploy a Helm chart via ArgoCD, introduce a breaking change, diagnose the drift, and roll back.
If Cross-Domain & Incident Response < 12¶
You need: Practice synthesizing information across domains under time pressure. Individual domain knowledge may be adequate but your ability to correlate and respond is the gap.
2-week plan:
Week 1 — Incident Response Foundations
| Day | Activity | Resource |
|---|---|---|
| Mon | Primer: Incident Command | incident-command/primer.md |
| Tue | Street Ops: Incident Command | incident-command/street_ops.md |
| Wed | Primer: Incident Triage | incident-triage/primer.md |
| Thu | Primer: Incident Psychology | incident-psychology/primer.md |
| Fri | Primer: Postmortem and SLO | postmortem-slo/primer.md |
| Sat | Primer: Disaster Recovery | disaster-recovery/primer.md |
| Sun | Primer: Runbook Craft | runbook-craft/primer.md |
Week 2 — Cross-Domain Case Studies
| Day | Activity | Resource |
|---|---|---|
| Mon | Case Study: Alert Storm Flapping Healthchecks | case-studies/cross-domain/alert-storm-flapping-healthchecks/ |
| Tue | Case Study: API Latency BGP Route Leak ACL | case-studies/cross-domain/api-latency-bgp-route-leak-acl/ |
| Wed | Case Study: HPA Flapping Clock Skew NTP | case-studies/cross-domain/hpa-flapping-clock-skew-ntp/ |
| Thu | Case Study: Service Mesh 503 Envoy RBAC | case-studies/cross-domain/service-mesh-503-envoy-rbac/ |
| Fri | Case Study: Database Replication Lag RAID | case-studies/cross-domain/database-replication-lag-raid/ |
| Sat | Case Study: Job Queue CPU Throttle cgroup | case-studies/cross-domain/job-queue-cpu-throttle-cgroup/ |
| Sun | Case Study: SSH Timeout MTU Terraform | case-studies/cross-domain/ssh-timeout-mtu-terraform/ |
Daily drill (15 min): python3 tools/run_training_session.py build --strategy spaced --mode mixed --count 10
Capstone: Run 3 incident scenarios back-to-back under time pressure: python3 tools/run_scenario.py
Fast Track (4 weeks)¶
For engineers scoring 60-119 who need to get on-call ready quickly. This plan hits the highest-impact topics from each section, prioritizing hands-on practice over reading.
Week 1: Kubernetes and Linux Core¶
| Day | Morning (45 min) | Evening (30 min) |
|---|---|---|
| Mon | k8s-ops/primer.md | Case: CrashLoopBackOff |
| Tue | k8s-debugging-playbook/street_ops.md | Case: Service No Endpoints |
| Wed | linux-ops/primer.md | Case: OOM Killer |
| Thu | linux-performance/street_ops.md | Case: Runaway Logs |
| Fri | k8s-networking/primer.md | Case: CNI Broken |
| Sat | k8s-storage/primer.md | Case: PV Stuck |
| Sun | Review + spaced drill (60 min) | python3 tools/run_training_session.py build --strategy spaced --count 20 |
Week 2: Observability and Networking¶
| Day | Morning (45 min) | Evening (30 min) |
|---|---|---|
| Mon | prometheus-deep-dive/primer.md | prometheus-deep-dive/street_ops.md |
| Tue | alerting-rules/primer.md | Case: Grafana Empty |
| Wed | dns-deep-dive/primer.md | Case: DNS Slow |
| Thu | tls-certificates-ops/primer.md | Case: SSL Cert Chain |
| Fri | tracing/primer.md + opentelemetry/primer.md | Case: MTU Blackhole |
| Sat | networking-troubleshooting/street_ops.md | Case: DNS TLS cert-manager |
| Sun | Review + spaced drill (60 min) | python3 tools/run_training_session.py build --strategy spaced --count 20 |
Week 3: Security, CI/CD, and IaC¶
| Day | Morning (45 min) | Evening (30 min) |
|---|---|---|
| Mon | hashicorp-vault/primer.md | hashicorp-vault/street_ops.md |
| Tue | cert-manager/primer.md | Case: Deployment Stuck Vault |
| Wed | github-actions/primer.md | Case: CI Pipeline Docker |
| Thu | argocd-gitops/primer.md | argocd-gitops/street_ops.md |
| Fri | helm/primer.md | Case: Canary Deploy |
| Sat | terraform/primer.md | Case: Terraform State Lock |
| Sun | Review + spaced drill (60 min) | python3 tools/run_training_session.py build --strategy spaced --count 20 |
Week 4: Incident Response and Cross-Domain¶
| Day | Morning (45 min) | Evening (30 min) |
|---|---|---|
| Mon | incident-command/primer.md | incident-command/street_ops.md |
| Tue | incident-triage/primer.md | Case: Alert Storm |
| Wed | postmortem-slo/primer.md | Case: HPA Flapping NTP |
| Thu | disaster-recovery/primer.md | Case: Database Replication |
| Fri | Cross-domain case study marathon | 3 random case studies |
| Sat | Full re-assessment | assessment.md |
| Sun | Gap analysis and targeted review | Review weakest 10 questions from answer key |
Daily (every day for 4 weeks): 15-minute spaced repetition drill before starting the morning session:
Deep Track (8 weeks)¶
For engineers scoring below 60 or those who want comprehensive mastery. This plan covers all topics with depth, including footguns, trivia, and the full case study library.
Phase 1: Foundations (Weeks 1-2)¶
Goal: Build the core knowledge base across all domains.
Week 1:
| Day | Topic | Resources |
|---|---|---|
| Mon | Linux fundamentals | linux-ops/primer.md, linux-users-and-permissions/primer.md |
| Tue | Linux performance | linux-performance/primer.md, linux-memory-management/primer.md |
| Wed | Linux storage and disk | disk-and-storage-ops/primer.md, mounts-filesystems/primer.md |
| Thu | Linux networking | linux-ops/primer.md, networking/primer.md |
| Fri | Process management | process-management/primer.md, linux-signals-and-process-control/primer.md |
| Sat | Systemd and logging | linux-ops-systemd/primer.md, linux-logging/primer.md |
| Sun | Linux case studies (2) | oom-killer-events, runaway-logs-fill-disk |
Week 2:
| Day | Topic | Resources |
|---|---|---|
| Mon | Containers | docker/primer.md, containers-deep-dive/primer.md |
| Tue | Container images | container-images/primer.md, container-images/primer.md |
| Wed | Networking fundamentals | tcp-ip-deep-dive/primer.md, dns-deep-dive/primer.md |
| Thu | TLS and certificates | tls/primer.md, tls-pki/primer.md, tls-certificates-ops/primer.md |
| Fri | Git and version control | git/primer.md, git-advanced/primer.md |
| Sat | YAML/JSON/config | yaml-json-config/primer.md, environment-variables/primer.md |
| Sun | Networking case studies (2) | dns-resolution-slow, mtu-blackhole-tls-stalls |
Phase 2: Kubernetes Deep Dive (Weeks 3-4)¶
Week 3:
| Day | Topic | Resources |
|---|---|---|
| Mon | K8s core operations | k8s-ops/primer.md, k8s-ops/street_ops.md |
| Tue | Pods and scheduling | k8s-pods-and-scheduling/primer.md, k8s-ops (Probes)/primer.md |
| Wed | Networking and services | k8s-networking/primer.md, k8s-services-and-ingress/primer.md |
| Thu | Storage | k8s-storage/primer.md |
| Fri | Debugging playbook | k8s-debugging-playbook/primer.md, k8s-debugging-playbook/street_ops.md |
| Sat | HPA and scaling | k8s-ops (HPA)/primer.md, k8s-ops (HPA)/street_ops.md |
| Sun | K8s case studies (3) | crashloopbackoff, resource-quota, service-no-endpoints |
Week 4:
| Day | Topic | Resources |
|---|---|---|
| Mon | Node lifecycle | k8s-node-lifecycle/primer.md, node-maintenance/primer.md |
| Tue | RBAC and security | k8s-rbac/primer.md, k8s-rbac/street_ops.md |
| Wed | K8s ecosystem | k8s-ecosystem/primer.md |
| Thu | Helm deep dive | helm/primer.md, helm/street_ops.md, helm/footguns.md |
| Fri | Kustomize and config | kustomize/primer.md |
| Sat | K8s case studies (3) | cni-broken, drain-blocked, node-pressure |
| Sun | Mid-point re-assessment (Sections 1 + 4 only) | assessment.md |
Phase 3: Observability and Security (Weeks 5-6)¶
Week 5:
| Day | Topic | Resources |
|---|---|---|
| Mon | Prometheus | prometheus-deep-dive/primer.md, prometheus-deep-dive/street_ops.md |
| Tue | Grafana and dashboards | monitoring-fundamentals/primer.md, observability-deep-dive/primer.md |
| Wed | Alerting | alerting-rules/primer.md, postmortem-slo/primer.md |
| Thu | Logging and Loki | logging/primer.md, log-pipelines/primer.md |
| Fri | Tracing and OTel | tracing/primer.md, opentelemetry/primer.md |
| Sat | SLO tooling | slo-tooling/primer.md, dora-metrics/primer.md |
| Sun | Observability case studies | grafana-empty, disk-full-loki |
Week 6:
| Day | Topic | Resources |
|---|---|---|
| Mon | Vault | hashicorp-vault/primer.md, hashicorp-vault/street_ops.md |
| Tue | Secrets management | secrets-management/primer.md, secrets-management/footguns.md |
| Wed | cert-manager and TLS | cert-manager/primer.md, tls-certificates-ops/street_ops.md |
| Thu | OPA and policy | policy-engines/primer.md, security-basics/primer.md |
| Fri | Container security | container-images/primer.md, security-scanning/primer.md |
| Sat | Linux hardening | linux-hardening/primer.md, selinux-apparmor/primer.md |
| Sun | Security case studies | deployment-stuck-vault, container-vuln-scanner |
Phase 4: CI/CD, IaC, and Incident Response (Weeks 7-8)¶
Week 7:
| Day | Topic | Resources |
|---|---|---|
| Mon | GitHub Actions | github-actions/primer.md, github-actions/street_ops.md |
| Tue | ArgoCD and GitOps | argocd-gitops/primer.md, argocd-gitops/street_ops.md |
| Wed | CI/CD patterns | ci-cd-patterns/primer.md, progressive-delivery/primer.md |
| Thu | Terraform | terraform/primer.md, terraform-deep-dive/primer.md |
| Fri | Ansible | ansible/primer.md, ansible-deep-dive/primer.md |
| Sat | IaC case studies | terraform-state-lock, ansible-ssh-agent |
| Sun | CI/CD case studies | ci-pipeline-docker, canary-deploy |
Week 8:
| Day | Topic | Resources |
|---|---|---|
| Mon | Incident command | incident-command/primer.md, incident-command/street_ops.md |
| Tue | Incident triage | incident-triage/primer.md, incident-psychology/primer.md |
| Wed | Postmortems and SLOs | postmortem-slo/primer.md, postmortem-slo/street_ops.md |
| Thu | Disaster recovery | disaster-recovery/primer.md, backup-restore/primer.md |
| Fri | Cross-domain case study marathon (3) | alert-storm, hpa-flapping-ntp, service-mesh-503 |
| Sat | Cross-domain case study marathon (3) | database-replication, job-queue-cgroup, node-notready-ansible |
| Sun | Full re-assessment | assessment.md — compare against initial scores |
Daily (every day for 8 weeks): 15-minute spaced repetition drill:
Weekly (every Sunday): Run the study recommendation engine to adjust focus:
Quick Reference: Topic Pack Index¶
These are the most relevant topic packs for the production readiness assessment, organized by section:
Pages that link here¶
- Ansible Deep Dive
- Ansible Deep Dive - Primer
- Ansible Deep Dive - Street Ops
- Ansible for Infrastructure Automation - Primer
- ArgoCD & GitOps
- ArgoCD & GitOps — Primer
- ArgoCD & GitOps — Street-Level Ops
- Backup & Restore Primer
- Chaos Engineering & Fault Injection
- Container Base Images — Primer
- Cross-Domain Incident Case Studies
- DNS Deep Dive
- DNS Deep Dive - Primer
- DORA Metrics & DevEx — Primer
- Disaster Recovery & Backup Engineering