Portal | Level: L1: Foundations | Topics: Probes (Liveness/Readiness), Kubernetes Core | Domain: Kubernetes

Runbook: Readiness Probe Failed¶

Symptoms¶

Pod is Running but not Ready (0/1)
Service returns no endpoints
kubectl describe pod shows "Readiness probe failed"
Rollout appears stuck

Fast Triage¶

kubectl get pods -n grokdevops -o wide
kubectl describe pod -n grokdevops -l app.kubernetes.io/name=grokdevops | grep -A10 "Conditions\|Readiness"
kubectl get endpoints grokdevops -n grokdevops
kubectl logs -n grokdevops deploy/grokdevops

Likely Causes (ranked)¶

Wrong probe path — endpoint doesn't exist or returns non-200
App not listening on expected port — check containerPort vs probe port
Slow startup — app needs more initialDelaySeconds
Dependency not ready — app waiting for database/external service

Evidence Interpretation¶

What bad looks like:

NAME                          READY   STATUS    RESTARTS   AGE
grokdevops-6b5d4f7c88-abc12  0/1     Running   0          3m

- 0/1 Ready — the container is running but the readiness probe is failing, so Kubernetes removes it from Service endpoints. - kubectl get endpoints grokdevops -n grokdevops will show an empty or missing address list. - During a rolling update, old pods keep serving traffic because new pods never become Ready; the rollout appears stuck. - kubectl describe pod will show repeated "Readiness probe failed: …" events with the HTTP status or connection error.

Fix Steps¶

Check what the probe is configured to hit:

kubectl get deploy grokdevops -n grokdevops \
  -o jsonpath='{.spec.template.spec.containers[0].readinessProbe}' | python3 -m json.tool

Test the endpoint manually:

kubectl exec -n grokdevops deploy/grokdevops -- wget -qO- http://localhost:8000/health

Fix the probe path in Helm values or patch:

kubectl patch deployment grokdevops -n grokdevops --type=json \
  -p='[{"op":"replace","path":"/spec/template/spec/containers/0/readinessProbe/httpGet/path","value":"/health"}]'

Verification¶

kubectl get pods -n grokdevops  # 1/1 Ready
kubectl get endpoints grokdevops -n grokdevops  # has IP addresses

Cleanup¶

Redeploy from Helm to ensure persistent fix:

helm upgrade grokdevops devops/helm/grokdevops -n grokdevops -f devops/helm/values-dev.yaml

Unknown Unknowns¶

Readiness and liveness probes do different things: readiness removes the pod from endpoints (no traffic), liveness restarts the container. Confusing them leads to wrong fixes.
The Deployment has a progressDeadlineSeconds (default 600s). If new pods aren't Ready within that window the rollout is marked failed — but old pods still serve.
Old ReplicaSet pods keep running and handling traffic during a stuck rollout; users may not notice the failure immediately.

Pitfalls¶

[!WARNING] Readiness and liveness probes serve opposite purposes. Readiness removes the pod from traffic; liveness restarts the container. Adding a liveness probe to "fix" a readiness failure causes restart loops and potential downtime instead of graceful traffic removal.

Deleting pods — the Deployment recreates them with the same failing probe config. Fix the probe or the app first.
Confusing readiness with liveness — adding a liveness probe to "fix" a readiness issue causes restart loops instead of graceful traffic removal.
Not testing the endpoint from inside the pod — the probe runs inside the cluster network; test with kubectl exec ... wget/curl, not from your laptop.

Runbook: Readiness Probe Failed¶

Symptoms¶

Fast Triage¶

Likely Causes (ranked)¶

Evidence Interpretation¶

Fix Steps¶

Verification¶

Cleanup¶

Unknown Unknowns¶

Pitfalls¶

See Also¶

Wiki Navigation¶

Pages that link here¶

Runbook: Readiness Probe Failed¶

Symptoms¶

Fast Triage¶

Likely Causes (ranked)¶

Evidence Interpretation¶

Fix Steps¶

Verification¶

Cleanup¶

Unknown Unknowns¶

Pitfalls¶

See Also¶

Wiki Navigation¶

Related Content¶

Pages that link here¶