Drill: Debug a Pod Stuck in Pending State¶

Goal¶

Systematically diagnose why a pod is stuck in Pending state by checking scheduling constraints, resources, and events.

kubectl configured with cluster access
A pod that is in Pending state (or create one with impossible resource requests to practice)

Confirm the pod is Pending:

kubectl get pod <pod-name> -n <namespace> -o wide

Check pod events for scheduling failures:

kubectl describe pod <pod-name> -n <namespace> | grep -A 20 "Events:"

Check node resources available:

kubectl describe nodes | grep -A 5 "Allocated resources"

Check if node selectors or affinities are too restrictive:

kubectl get pod <pod-name> -o jsonpath='{.spec.nodeSelector}'
kubectl get pod <pod-name> -o yaml | grep -A 10 affinity

Check for taints that may prevent scheduling:

kubectl get nodes -o custom-columns='NAME:.metadata.name,TAINTS:.spec.taints[*].effect'

Check tolerations on the pod:

kubectl get pod <pod-name> -o jsonpath='{.spec.tolerations}' | python3 -m json.tool

Check if PersistentVolumeClaims are bound:

kubectl get pvc -n <namespace>

Check resource quotas in the namespace:

kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>

Insufficient cpu or Insufficient memory in events means no node has enough capacity
node(s) didn't match Pod's node affinity/selector means label or affinity mismatch
node(s) had taints that the pod didn't tolerate means taint/toleration mismatch
persistentvolumeclaim not found or unbound means storage is not available
Quota exceeded means the namespace resource quota is full

No cleanup needed if you are inspecting an existing pod. Delete any test pods you created.