Skip to content

Portal | Level: L1: Foundations | Topics: Probes (Liveness/Readiness), Kubernetes Core | Domain: Kubernetes

Runbook: Readiness Probe Failed

Symptoms

  • Pod is Running but not Ready (0/1)
  • Service returns no endpoints
  • kubectl describe pod shows "Readiness probe failed"
  • Rollout appears stuck

Fast Triage

kubectl get pods -n grokdevops -o wide
kubectl describe pod -n grokdevops -l app.kubernetes.io/name=grokdevops | grep -A10 "Conditions\|Readiness"
kubectl get endpoints grokdevops -n grokdevops
kubectl logs -n grokdevops deploy/grokdevops

Likely Causes (ranked)

  1. Wrong probe path — endpoint doesn't exist or returns non-200
  2. App not listening on expected port — check containerPort vs probe port
  3. Slow startup — app needs more initialDelaySeconds
  4. Dependency not ready — app waiting for database/external service

Evidence Interpretation

What bad looks like:

NAME                          READY   STATUS    RESTARTS   AGE
grokdevops-6b5d4f7c88-abc12  0/1     Running   0          3m
- 0/1 Ready — the container is running but the readiness probe is failing, so Kubernetes removes it from Service endpoints. - kubectl get endpoints grokdevops -n grokdevops will show an empty or missing address list. - During a rolling update, old pods keep serving traffic because new pods never become Ready; the rollout appears stuck. - kubectl describe pod will show repeated "Readiness probe failed: …" events with the HTTP status or connection error.

Fix Steps

  1. Check what the probe is configured to hit:
    kubectl get deploy grokdevops -n grokdevops \
      -o jsonpath='{.spec.template.spec.containers[0].readinessProbe}' | python3 -m json.tool
    
  2. Test the endpoint manually:
    kubectl exec -n grokdevops deploy/grokdevops -- wget -qO- http://localhost:8000/health
    
  3. Fix the probe path in Helm values or patch:
    kubectl patch deployment grokdevops -n grokdevops --type=json \
      -p='[{"op":"replace","path":"/spec/template/spec/containers/0/readinessProbe/httpGet/path","value":"/health"}]'
    

Verification

kubectl get pods -n grokdevops  # 1/1 Ready
kubectl get endpoints grokdevops -n grokdevops  # has IP addresses

Cleanup

Redeploy from Helm to ensure persistent fix:

helm upgrade grokdevops devops/helm/grokdevops -n grokdevops -f devops/helm/values-dev.yaml

Unknown Unknowns

  • Readiness and liveness probes do different things: readiness removes the pod from endpoints (no traffic), liveness restarts the container. Confusing them leads to wrong fixes.
  • The Deployment has a progressDeadlineSeconds (default 600s). If new pods aren't Ready within that window the rollout is marked failed — but old pods still serve.
  • Old ReplicaSet pods keep running and handling traffic during a stuck rollout; users may not notice the failure immediately.

Pitfalls

[!WARNING] Readiness and liveness probes serve opposite purposes. Readiness removes the pod from traffic; liveness restarts the container. Adding a liveness probe to "fix" a readiness failure causes restart loops and potential downtime instead of graceful traffic removal.

  • Deleting pods — the Deployment recreates them with the same failing probe config. Fix the probe or the app first.
  • Confusing readiness with liveness — adding a liveness probe to "fix" a readiness issue causes restart loops instead of graceful traffic removal.
  • Not testing the endpoint from inside the pod — the probe runs inside the cluster network; test with kubectl exec ... wget/curl, not from your laptop.

See Also

  • training/interactive/runtime-labs/lab-runtime-01-rollout-probe-failure/
  • training/interview-scenarios/01-deployment-stuck-progressing.md
  • training/interactive/incidents/scenarios/readiness-probe-wrong-path.sh

Wiki Navigation