Skip to content

Production Readiness Review: Study Plans

These study plans are generated based on your assessment results and scoring. Each plan cross-references real topic packs, case studies, and drills from the grokdevops wiki.


By Weak Section

If Kubernetes Operations < 20

You need: Practical K8s troubleshooting under pressure. You may understand the concepts but lack muscle memory for the specific commands and diagnostic flows.

2-week plan:

Week 1 — Core Operations and Debugging

Day Activity Resource
Mon Primer: K8s Operations k8s-ops/primer.md
Tue Primer: Pods and Scheduling k8s-pods-and-scheduling/primer.md
Wed Street Ops: K8s Debugging Playbook k8s-debugging-playbook/street_ops.md
Thu Case Study: CrashLoopBackOff case-studies/kubernetes_ops/crashloopbackoff-no-logs/
Fri Case Study: Resource Quota Blocking case-studies/kubernetes_ops/resource-quota-blocking-deploy/
Sat Primer: K8s Storage k8s-storage/primer.md
Sun Case Study: PV Stuck Terminating case-studies/kubernetes_ops/persistent-volume-stuck-terminating/

Week 2 — Networking, Scaling, and Node Operations

Day Activity Resource
Mon Primer: K8s Networking k8s-networking/primer.md
Tue Primer: K8s Services and Ingress k8s-services-and-ingress/primer.md
Wed Street Ops: K8s HPA k8s-ops (HPA)/street_ops.md
Thu Case Study: CNI Broken After Restart case-studies/kubernetes_ops/cni-broken-after-restart/
Fri Case Study: Drain Blocked by PDB case-studies/kubernetes_ops/drain-blocked-by-pdb/
Sat Primer: Node Lifecycle k8s-node-lifecycle/primer.md
Sun Case Study: Node Pressure Evictions case-studies/kubernetes_ops/node-pressure-evictions/

Daily drill (15 min): python3 tools/run_training_session.py build --strategy spaced --topic k8s --count 10

Capstone: Complete all 10 Kubernetes case studies, then re-score Section 1.


If Observability < 16

You need: Comfort with PromQL, Loki queries, and tracing. You need to be able to investigate an incident using the observability stack without hesitation.

2-week plan:

Week 1 — Metrics and Alerting

Day Activity Resource
Mon Primer: Prometheus prometheus-deep-dive/primer.md
Tue Street Ops: Prometheus prometheus-deep-dive/street_ops.md
Wed Primer: Alerting Rules alerting-rules/primer.md
Thu Primer: Monitoring Fundamentals monitoring-fundamentals/primer.md
Fri Footguns: Prometheus prometheus-deep-dive/footguns.md
Sat Primer: SLO Tooling slo-tooling/primer.md
Sun Primer: Postmortem and SLO postmortem-slo/primer.md

Week 2 — Logs, Traces, and Integration

Day Activity Resource
Mon Primer: Logging logging/primer.md
Tue Primer: Log Pipelines log-pipelines/primer.md
Wed Primer: Tracing tracing/primer.md
Thu Primer: OpenTelemetry opentelemetry/primer.md
Fri Primer: Observability Deep Dive observability-deep-dive/primer.md
Sat Case Study: Grafana Empty / NetworkPolicy case-studies/cross-domain/grafana-empty-prometheus-networkpolicy/
Sun Case Study: Disk Full / Runaway Logs / Loki case-studies/cross-domain/disk-full-runaway-logs-loki/

Daily drill (10 min): python3 tools/run_training_session.py build --strategy spaced --topic observability --count 8

Capstone: Write 5 PromQL queries from memory: error rate, latency percentile, saturation, SLO burn rate, and cardinality count.


If Networking < 14

You need: Practical network debugging skills for a Kubernetes + cloud environment. DNS, TLS, MTU, and ingress troubleshooting are the priority.

2-week plan:

Week 1 — Fundamentals

Day Activity Resource
Mon Primer: DNS Deep Dive dns-deep-dive/primer.md
Tue Primer: TLS tls/primer.md
Wed Primer: TLS Certificates Ops tls-certificates-ops/primer.md
Thu Street Ops: Networking Troubleshooting networking-troubleshooting/street_ops.md
Fri Primer: MTU mtu/primer.md
Sat Case Study: MTU Blackhole TLS Stalls case-studies/networking/mtu-blackhole-tls-stalls/
Sun Case Study: DNS Resolution Slow case-studies/networking/dns-resolution-slow/

Week 2 — Kubernetes Networking and Ingress

Day Activity Resource
Mon Primer: K8s Networking k8s-networking/primer.md
Tue Primer: K8s Services and Ingress k8s-services-and-ingress/primer.md
Wed Primer: NGINX Web Servers nginx-web-servers/primer.md
Thu Primer: Load Balancing load-balancing/primer.md
Fri Case Study: SSL Cert Chain Incomplete case-studies/networking/ssl-cert-chain-incomplete/
Sat Case Study: DNS TLS cert-manager case-studies/cross-domain/dns-tls-certmanager/
Sun Case Study: CoredDNS Timeout Pod DNS case-studies/kubernetes_ops/coredns-timeout-pod-dns/

Daily drill (10 min): python3 tools/run_training_session.py build --strategy spaced --topic networking --count 8

Capstone: Trace a full request path from client to pod using tcpdump, curl -v, and Ingress-NGINX logs.


If Linux & Infrastructure < 14

You need: Confidence in Linux system debugging and infrastructure-as-code. You must be able to diagnose disk, memory, and process issues on nodes, and understand Terraform and Ansible well enough to troubleshoot failures.

2-week plan:

Week 1 — Linux Operations

Day Activity Resource
Mon Primer: Linux Ops linux-ops/primer.md
Tue Primer: Linux Performance linux-performance/primer.md
Wed Street Ops: Linux Performance linux-performance/street_ops.md
Thu Primer: Linux Memory Management linux-memory-management/primer.md
Fri Case Study: OOM Killer Events case-studies/linux_ops/oom-killer-events/
Sat Primer: Disk and Storage Ops disk-and-storage-ops/primer.md
Sun Case Study: Runaway Logs Fill Disk case-studies/linux_ops/runaway-logs-fill-disk/

Week 2 — Infrastructure as Code

Day Activity Resource
Mon Primer: Terraform terraform/primer.md
Tue Street Ops: Terraform Deep Dive terraform-deep-dive/street_ops.md
Wed Primer: Ansible ansible/primer.md
Thu Street Ops: Ansible Deep Dive ansible-deep-dive/street_ops.md
Fri Case Study: Time Sync Skew Breaks App case-studies/linux_ops/time-sync-skew-breaks-app/
Sat Case Study: Terraform State Lock DynamoDB case-studies/cross-domain/terraform-state-lock-dynamodb/
Sun Case Study: Node NotReady NIC Firmware Ansible case-studies/cross-domain/node-notready-nic-firmware-ansible/

Daily drill (10 min): python3 tools/run_training_session.py build --strategy spaced --topic linux --count 8

Capstone: On a test node, diagnose a synthetic disk/memory/process issue using only top, iostat, vmstat, df, du, lsof, dmesg, and journalctl.


If Security < 12

You need: Understanding of the security stack (Vault, cert-manager, OPA) and incident response procedures for secret leaks and misconfigurations.

2-week plan:

Week 1 — Secrets and Certificate Management

Day Activity Resource
Mon Primer: HashiCorp Vault hashicorp-vault/primer.md
Tue Street Ops: Vault hashicorp-vault/street_ops.md
Wed Primer: Secrets Management secrets-management/primer.md
Thu Primer: cert-manager cert-manager/primer.md
Fri Primer: TLS PKI tls-pki/primer.md
Sat Case Study: Deployment Stuck ImagePull Vault case-studies/cross-domain/deployment-stuck-imagepull-vault/
Sun Case Study: DNS TLS cert-manager case-studies/cross-domain/dns-tls-certmanager/

Week 2 — Policy, Hardening, and Incident Response

Day Activity Resource
Mon Primer: Policy Engines (OPA) policy-engines/primer.md
Tue Primer: Security Basics security-basics/primer.md
Wed Primer: Linux Hardening linux-hardening/primer.md
Thu Primer: K8s RBAC k8s-rbac/primer.md
Fri Primer: Container Image Scanning container-images/primer.md
Sat Case Study: Container Vuln Scanner False Positive case-studies/cross-domain/container-vuln-scanner-false-positive/
Sun Primer: Supply Chain Security supply-chain-security/primer.md

Daily drill (10 min): python3 tools/run_training_session.py build --strategy spaced --topic security --count 8

Capstone: Walk through a complete secret leak response: revoke, rotate, audit, and harden.


If CI/CD & DevOps Tooling < 12

You need: Fluency with the GitHub Actions + ArgoCD + Helm pipeline. You must be able to debug build failures, resolve GitOps drift, and execute safe rollbacks.

2-week plan:

Week 1 — CI/CD Fundamentals

Day Activity Resource
Mon Primer: GitHub Actions github-actions/primer.md
Tue Street Ops: GitHub Actions github-actions/street_ops.md
Wed Primer: CI/CD Patterns ci-cd-patterns/primer.md
Thu Primer: Docker docker/primer.md
Fri Primer: Container Image Optimization container-images/primer.md
Sat Case Study: CI Pipeline Docker Cache Registry case-studies/cross-domain/ci-pipeline-docker-cache-registry/
Sun Footguns: GitHub Actions github-actions/footguns.md

Week 2 — GitOps, Helm, and Deployment Strategies

Day Activity Resource
Mon Primer: ArgoCD GitOps argocd-gitops/primer.md
Tue Street Ops: ArgoCD GitOps argocd-gitops/street_ops.md
Wed Primer: Helm helm/primer.md
Thu Street Ops: Helm helm/street_ops.md
Fri Primer: Progressive Delivery progressive-delivery/primer.md
Sat Case Study: Canary Deploy Wrong Backend Ingress case-studies/cross-domain/canary-deploy-wrong-backend-ingress/
Sun Case Study: Pod OOMKilled Sidecar Helm case-studies/cross-domain/pod-oomkilled-sidecar-helm/

Daily drill (10 min): python3 tools/run_training_session.py build --strategy spaced --topic cicd --count 8

Capstone: Deploy a Helm chart via ArgoCD, introduce a breaking change, diagnose the drift, and roll back.


If Cross-Domain & Incident Response < 12

You need: Practice synthesizing information across domains under time pressure. Individual domain knowledge may be adequate but your ability to correlate and respond is the gap.

2-week plan:

Week 1 — Incident Response Foundations

Day Activity Resource
Mon Primer: Incident Command incident-command/primer.md
Tue Street Ops: Incident Command incident-command/street_ops.md
Wed Primer: Incident Triage incident-triage/primer.md
Thu Primer: Incident Psychology incident-psychology/primer.md
Fri Primer: Postmortem and SLO postmortem-slo/primer.md
Sat Primer: Disaster Recovery disaster-recovery/primer.md
Sun Primer: Runbook Craft runbook-craft/primer.md

Week 2 — Cross-Domain Case Studies

Day Activity Resource
Mon Case Study: Alert Storm Flapping Healthchecks case-studies/cross-domain/alert-storm-flapping-healthchecks/
Tue Case Study: API Latency BGP Route Leak ACL case-studies/cross-domain/api-latency-bgp-route-leak-acl/
Wed Case Study: HPA Flapping Clock Skew NTP case-studies/cross-domain/hpa-flapping-clock-skew-ntp/
Thu Case Study: Service Mesh 503 Envoy RBAC case-studies/cross-domain/service-mesh-503-envoy-rbac/
Fri Case Study: Database Replication Lag RAID case-studies/cross-domain/database-replication-lag-raid/
Sat Case Study: Job Queue CPU Throttle cgroup case-studies/cross-domain/job-queue-cpu-throttle-cgroup/
Sun Case Study: SSH Timeout MTU Terraform case-studies/cross-domain/ssh-timeout-mtu-terraform/

Daily drill (15 min): python3 tools/run_training_session.py build --strategy spaced --mode mixed --count 10

Capstone: Run 3 incident scenarios back-to-back under time pressure: python3 tools/run_scenario.py


Fast Track (4 weeks)

For engineers scoring 60-119 who need to get on-call ready quickly. This plan hits the highest-impact topics from each section, prioritizing hands-on practice over reading.

Week 1: Kubernetes and Linux Core

Day Morning (45 min) Evening (30 min)
Mon k8s-ops/primer.md Case: CrashLoopBackOff
Tue k8s-debugging-playbook/street_ops.md Case: Service No Endpoints
Wed linux-ops/primer.md Case: OOM Killer
Thu linux-performance/street_ops.md Case: Runaway Logs
Fri k8s-networking/primer.md Case: CNI Broken
Sat k8s-storage/primer.md Case: PV Stuck
Sun Review + spaced drill (60 min) python3 tools/run_training_session.py build --strategy spaced --count 20

Week 2: Observability and Networking

Day Morning (45 min) Evening (30 min)
Mon prometheus-deep-dive/primer.md prometheus-deep-dive/street_ops.md
Tue alerting-rules/primer.md Case: Grafana Empty
Wed dns-deep-dive/primer.md Case: DNS Slow
Thu tls-certificates-ops/primer.md Case: SSL Cert Chain
Fri tracing/primer.md + opentelemetry/primer.md Case: MTU Blackhole
Sat networking-troubleshooting/street_ops.md Case: DNS TLS cert-manager
Sun Review + spaced drill (60 min) python3 tools/run_training_session.py build --strategy spaced --count 20

Week 3: Security, CI/CD, and IaC

Day Morning (45 min) Evening (30 min)
Mon hashicorp-vault/primer.md hashicorp-vault/street_ops.md
Tue cert-manager/primer.md Case: Deployment Stuck Vault
Wed github-actions/primer.md Case: CI Pipeline Docker
Thu argocd-gitops/primer.md argocd-gitops/street_ops.md
Fri helm/primer.md Case: Canary Deploy
Sat terraform/primer.md Case: Terraform State Lock
Sun Review + spaced drill (60 min) python3 tools/run_training_session.py build --strategy spaced --count 20

Week 4: Incident Response and Cross-Domain

Day Morning (45 min) Evening (30 min)
Mon incident-command/primer.md incident-command/street_ops.md
Tue incident-triage/primer.md Case: Alert Storm
Wed postmortem-slo/primer.md Case: HPA Flapping NTP
Thu disaster-recovery/primer.md Case: Database Replication
Fri Cross-domain case study marathon 3 random case studies
Sat Full re-assessment assessment.md
Sun Gap analysis and targeted review Review weakest 10 questions from answer key

Daily (every day for 4 weeks): 15-minute spaced repetition drill before starting the morning session:

python3 tools/run_training_session.py build --strategy spaced --count 10


Deep Track (8 weeks)

For engineers scoring below 60 or those who want comprehensive mastery. This plan covers all topics with depth, including footguns, trivia, and the full case study library.

Phase 1: Foundations (Weeks 1-2)

Goal: Build the core knowledge base across all domains.

Week 1:

Day Topic Resources
Mon Linux fundamentals linux-ops/primer.md, linux-users-and-permissions/primer.md
Tue Linux performance linux-performance/primer.md, linux-memory-management/primer.md
Wed Linux storage and disk disk-and-storage-ops/primer.md, mounts-filesystems/primer.md
Thu Linux networking linux-ops/primer.md, networking/primer.md
Fri Process management process-management/primer.md, linux-signals-and-process-control/primer.md
Sat Systemd and logging linux-ops-systemd/primer.md, linux-logging/primer.md
Sun Linux case studies (2) oom-killer-events, runaway-logs-fill-disk

Week 2:

Day Topic Resources
Mon Containers docker/primer.md, containers-deep-dive/primer.md
Tue Container images container-images/primer.md, container-images/primer.md
Wed Networking fundamentals tcp-ip-deep-dive/primer.md, dns-deep-dive/primer.md
Thu TLS and certificates tls/primer.md, tls-pki/primer.md, tls-certificates-ops/primer.md
Fri Git and version control git/primer.md, git-advanced/primer.md
Sat YAML/JSON/config yaml-json-config/primer.md, environment-variables/primer.md
Sun Networking case studies (2) dns-resolution-slow, mtu-blackhole-tls-stalls

Phase 2: Kubernetes Deep Dive (Weeks 3-4)

Week 3:

Day Topic Resources
Mon K8s core operations k8s-ops/primer.md, k8s-ops/street_ops.md
Tue Pods and scheduling k8s-pods-and-scheduling/primer.md, k8s-ops (Probes)/primer.md
Wed Networking and services k8s-networking/primer.md, k8s-services-and-ingress/primer.md
Thu Storage k8s-storage/primer.md
Fri Debugging playbook k8s-debugging-playbook/primer.md, k8s-debugging-playbook/street_ops.md
Sat HPA and scaling k8s-ops (HPA)/primer.md, k8s-ops (HPA)/street_ops.md
Sun K8s case studies (3) crashloopbackoff, resource-quota, service-no-endpoints

Week 4:

Day Topic Resources
Mon Node lifecycle k8s-node-lifecycle/primer.md, node-maintenance/primer.md
Tue RBAC and security k8s-rbac/primer.md, k8s-rbac/street_ops.md
Wed K8s ecosystem k8s-ecosystem/primer.md
Thu Helm deep dive helm/primer.md, helm/street_ops.md, helm/footguns.md
Fri Kustomize and config kustomize/primer.md
Sat K8s case studies (3) cni-broken, drain-blocked, node-pressure
Sun Mid-point re-assessment (Sections 1 + 4 only) assessment.md

Phase 3: Observability and Security (Weeks 5-6)

Week 5:

Day Topic Resources
Mon Prometheus prometheus-deep-dive/primer.md, prometheus-deep-dive/street_ops.md
Tue Grafana and dashboards monitoring-fundamentals/primer.md, observability-deep-dive/primer.md
Wed Alerting alerting-rules/primer.md, postmortem-slo/primer.md
Thu Logging and Loki logging/primer.md, log-pipelines/primer.md
Fri Tracing and OTel tracing/primer.md, opentelemetry/primer.md
Sat SLO tooling slo-tooling/primer.md, dora-metrics/primer.md
Sun Observability case studies grafana-empty, disk-full-loki

Week 6:

Day Topic Resources
Mon Vault hashicorp-vault/primer.md, hashicorp-vault/street_ops.md
Tue Secrets management secrets-management/primer.md, secrets-management/footguns.md
Wed cert-manager and TLS cert-manager/primer.md, tls-certificates-ops/street_ops.md
Thu OPA and policy policy-engines/primer.md, security-basics/primer.md
Fri Container security container-images/primer.md, security-scanning/primer.md
Sat Linux hardening linux-hardening/primer.md, selinux-apparmor/primer.md
Sun Security case studies deployment-stuck-vault, container-vuln-scanner

Phase 4: CI/CD, IaC, and Incident Response (Weeks 7-8)

Week 7:

Day Topic Resources
Mon GitHub Actions github-actions/primer.md, github-actions/street_ops.md
Tue ArgoCD and GitOps argocd-gitops/primer.md, argocd-gitops/street_ops.md
Wed CI/CD patterns ci-cd-patterns/primer.md, progressive-delivery/primer.md
Thu Terraform terraform/primer.md, terraform-deep-dive/primer.md
Fri Ansible ansible/primer.md, ansible-deep-dive/primer.md
Sat IaC case studies terraform-state-lock, ansible-ssh-agent
Sun CI/CD case studies ci-pipeline-docker, canary-deploy

Week 8:

Day Topic Resources
Mon Incident command incident-command/primer.md, incident-command/street_ops.md
Tue Incident triage incident-triage/primer.md, incident-psychology/primer.md
Wed Postmortems and SLOs postmortem-slo/primer.md, postmortem-slo/street_ops.md
Thu Disaster recovery disaster-recovery/primer.md, backup-restore/primer.md
Fri Cross-domain case study marathon (3) alert-storm, hpa-flapping-ntp, service-mesh-503
Sat Cross-domain case study marathon (3) database-replication, job-queue-cgroup, node-notready-ansible
Sun Full re-assessment assessment.md — compare against initial scores

Daily (every day for 8 weeks): 15-minute spaced repetition drill:

python3 tools/run_training_session.py build --strategy spaced --count 10

Weekly (every Sunday): Run the study recommendation engine to adjust focus:

python3 tools/recommend_training_session.py --top 5


Quick Reference: Topic Pack Index

These are the most relevant topic packs for the production readiness assessment, organized by section:

Section Key Topic Packs
Kubernetes k8s-ops, k8s-debugging-playbook, k8s-networking, k8s-storage, k8s-ops (HPA), k8s-node-lifecycle, k8s-services-and-ingress, k8s-pods-and-scheduling, k8s-ops (Probes), k8s-rbac, helm
Observability prometheus-deep-dive, alerting-rules, logging, tracing, opentelemetry, observability-deep-dive, monitoring-fundamentals, slo-tooling
Networking dns-deep-dive, tls, tls-certificates-ops, mtu, networking-troubleshooting, k8s-networking, load-balancing, nginx-web-servers
Linux & Infra linux-ops, linux-performance, linux-memory-management, disk-and-storage-ops, terraform, terraform-deep-dive, ansible, ansible-deep-dive
Security hashicorp-vault, secrets-management, cert-manager, tls-pki, policy-engines, k8s-rbac, linux-hardening, container-images
CI/CD github-actions, argocd-gitops, helm, ci-cd-patterns, progressive-delivery, docker, container-images
Cross-Domain incident-command, incident-triage, incident-psychology, postmortem-slo, disaster-recovery, runbook-craft, chaos-engineering