Skip to content

Solution: Lab Runtime 01 -- Readiness Probe Failure

SPOILER WARNING: Try to solve it yourself first. Use hints progressively.


Hint Ladder

Hint 1: The problem is with how Kubernetes determines if a pod is ready to receive traffic. What mechanism does K8s use for that?

Hint 2: Check the readiness probe configuration on the deployment. What HTTP path is it hitting? Does that path exist?

Hint 3: The readiness probe was changed to hit /nonexistent. K8s gets a non-200 response, so it marks the pod as not ready. Check with: kubectl get deploy grokdevops -n grokdevops -o jsonpath='{.spec.template.spec.containers[0].readinessProbe}'

Hint 4: Fix the probe path back to /health. You can either patch the deployment directly or use ./fix.sh.


Minimal Solution

kubectl patch deployment grokdevops -n grokdevops --type=json \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/readinessProbe/httpGet/path","value":"/health"}]'
kubectl rollout status deployment/grokdevops -n grokdevops --timeout=120s

Explain

Symptom: New pods show 0/1 Running (not READY). Rollout stalls; old pods continue serving traffic.

Evidence: kubectl describe pod shows Readiness probe failed: HTTP probe failed with statuscode: 404. The readiness probe is configured to hit /nonexistent which returns 404.

Root cause: The readiness probe HTTP path was changed from /health to /nonexistent. K8s readiness probes determine whether a pod should receive traffic via the Service. When the probe fails, K8s removes the pod from the Service endpoints, so no traffic is routed to it. During a rolling update, this means new pods never become ready, and the rollout controller won't terminate old pods.

Key insight: Readiness probe failures don't restart pods (that's liveness probes). They only remove pods from service endpoints. This is why the pod shows Running but 0/1 Ready.


Prevent

  • Pin readiness probe config in Helm values, not hardcoded in templates
  • Add a CI check that validates probe endpoints exist in the app
  • Use helm upgrade --dry-run before applying to catch bad probe configs
  • Set progressDeadlineSeconds to a reasonable value so stuck rollouts are detected quickly

See Also