Skip to content

Lab 21: Production Readiness Review

Field Value
Tier 5 — Capstone
Estimated Time 2 hours
Prerequisites All Tier 1-3 labs
Auto-Grade Yes

Scenario

A development team is requesting production deployment approval for their new "Order Processing Service." They claim it is production-ready, but your platform team requires a formal Production Readiness Review (PRR) before any service goes live. You are the reviewer.

The service is deployed in namespace lab-prr and it has multiple issues that would cause problems in production: no resource limits, single replica, no health checks, no monitoring, no security hardening, no backup strategy, no runbook, inadequate logging, and no graceful shutdown handling.

Your job is to audit the deployment against a production readiness checklist, fix every issue you find, and produce a PRR report documenting what was wrong and what you changed. The service must pass all production criteria before you sign off.

Objectives

  • Add resource requests and limits to all containers
  • Scale to at least 2 replicas with a PodDisruptionBudget
  • Add liveness and readiness probes
  • Add Prometheus monitoring annotations
  • Configure security context (non-root, read-only root FS, drop capabilities)
  • Add NetworkPolicy restricting ingress to required ports only
  • Create a ConfigMap-based runbook at /tmp/lab-prr/runbook.md
  • Write PRR report to /tmp/lab-prr/prr-report.txt
  • All pods are Running, Ready, and not restarting

Setup

./setup.sh

Deploys a minimal, non-production-ready service in namespace lab-prr.

Hints

Hint 1: Production readiness checklist Key areas: reliability (replicas, probes, PDB), observability (metrics, logging), security (RBAC, securityContext, NetworkPolicy), operations (runbook, alerts).
Hint 2: Security context
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]
Hint 3: NetworkPolicy for web service Allow ingress only on the service port, deny everything else:
ingress:
- ports:
  - port: 8080
Hint 4: Runbook structure Include: service overview, dependencies, common failure modes, debugging steps, rollback procedure, escalation contacts.
Hint 5: PRR report format For each checklist item: current state (pass/fail), what was wrong, what you changed, evidence that the fix works.

Grading

./grade.sh

Solution

See the solution/ directory for the production-hardened manifests.