- k8s
- l1
- runbook
- k8s-storage --- Portal | Level: L1: Foundations | Topics: Kubernetes Storage | Domain: Kubernetes
Runbook: PVC Stuck in Pending¶
| Field | Value |
|---|---|
| Domain | Kubernetes |
| Alert | kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1 for >5 min |
| Severity | P2 |
| Est. Resolution Time | 15-25 minutes |
| Escalation Timeout | 30 minutes — page if not resolved |
| Last Tested | 2026-03-19 |
| Prerequisites | kubectl access, cluster-admin or namespace-admin, kubeconfig configured |
Quick Assessment (30 seconds)¶
If output shows: multiple PVCs Pending across different namespaces → The storage provisioner may be down; check the provisioner pod inkube-system before proceeding
If output shows: a single PVC Pending → Continue with steps below
Step 1: Describe the PVC for Events¶
Why: The PVC events section is almost always the fastest path to the root cause — it shows provisioner messages, StorageClass errors, and topology failures.
Expected output — look for the Events section:Events:
Warning ProvisioningFailed 90s persistentvolume-controller
storageclass.storage.k8s.io "fast-ssd" not found
storageclass ... not found → Wrong StorageClass name — go to Step 2
- no persistent volumes available for this claim → No matching PV — go to Step 3
- waiting for a volume to be created → Provisioner is working, wait longer or check provisioner — go to Step 4
- volume node affinity conflict → Zone/topology mismatch — go to Step 5
If no events appear: The provisioner may not be running at all — go to Step 4.
Step 2: Check StorageClass Exists¶
Why: A typo in the StorageClass name or a deleted StorageClass is a very common cause. Dynamic provisioning silently fails if the StorageClass does not exist.
# List all available StorageClasses
kubectl get storageclass
# Check the StorageClass name in the PVC spec
kubectl get pvc <PVC_NAME> -n <NAMESPACE> -o jsonpath='{.spec.storageClassName}'
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
standard (default) kubernetes.io/gce-pd Delete Immediate
fast-ssd kubernetes.io/gce-pd Retain WaitForFirstConsumer
# Option A: Patch the PVC to use the correct StorageClass
# NOTE: StorageClassName is immutable on an existing PVC — delete and recreate
kubectl delete pvc <PVC_NAME> -n <NAMESPACE>
# Recreate with the correct storageClassName field in the manifest
kubectl apply -f <FIXED_PVC_MANIFEST>
Step 3: Check If a Matching PV Is Available¶
Why: In clusters that use static provisioning (pre-created PVs), the PVC can only bind to a PV that matches its access mode, storage size, and StorageClass. If no matching PV exists, the PVC stays Pending.
# List available (unbound) PVs
kubectl get pv | grep Available
# Get detailed PV spec to check access modes and capacity
kubectl get pvc <PVC_NAME> -n <NAMESPACE> -o yaml | grep -E "accessModes|storage|storageClassName"
kubectl get pv -o yaml | grep -E "accessModes|storage|storageClassName|status.phase"
claimRef pointing to a different PVC — it is reserved and cannot be reused without clearing the claimRef.
Step 4: Check Storage Provisioner Logs¶
Why: The dynamic provisioner (e.g., aws-ebs-csi-driver, gce-pd-csi-driver, ceph-csi) is responsible for creating the underlying volume. If it is crashed or misconfigured, all dynamic PVCs in the cluster will be stuck.
# Find the provisioner pod — it is usually in kube-system
kubectl get pods -n kube-system | grep -E "csi|provisioner|storage"
# Check provisioner logs
kubectl logs -n kube-system <PROVISIONER_POD_NAME> --tail=50
# Check if the provisioner has the necessary permissions
kubectl get clusterrolebinding | grep provisioner
I0319 10:00:00.000000 1 controller.go:1234] provisioning volume for claim "default/my-pvc"
I0319 10:00:05.000000 1 controller.go:1234] successfully provisioned volume pvc-abc123
kubectl rollout restart deployment/<PROVISIONER_DEPLOYMENT> -n kube-system and check for IAM/permission errors.
If this fails: Escalate to the platform team — the provisioner may need IAM role or cloud credentials reconfiguration.
Step 5: Check Node Affinity and Topology Constraints¶
Why: When volumeBindingMode: WaitForFirstConsumer is set on the StorageClass (which is correct for zone-aware provisioning), the PVC only binds once a pod is scheduled to use it. If the pod cannot be scheduled, the PVC stays Pending indefinitely.
# Check the StorageClass binding mode
kubectl get storageclass <STORAGECLASS_NAME> -o jsonpath='{.volumeBindingMode}'
# Check the pod that should be using this PVC
kubectl get pods -n <NAMESPACE> -o wide | grep <APP_NAME>
kubectl describe pod <POD_NAME> -n <NAMESPACE> | grep -A 10 "Events:"
# If pod is also pending, check its node affinity
kubectl get pod <POD_NAME> -n <NAMESPACE> -o yaml | grep -A 20 "affinity:"
Events:
Warning FailedScheduling 2m default-scheduler
0/3 nodes are available: 3 node(s) had volume node affinity conflict.
# Check which zones your nodes are in
kubectl get nodes -o custom-columns='NAME:.metadata.name,ZONE:.metadata.labels.topology\.kubernetes\.io/zone'
Step 6: Manually Provision a PV If Needed¶
Why: In some environments, dynamic provisioning is unavailable (air-gapped clusters, strict quotas) and a PV must be created manually to match the PVC.
# Check the exact PVC requirements
kubectl get pvc <PVC_NAME> -n <NAMESPACE> -o yaml
# Create a matching PV — edit the spec to match the PVC's accessModes, storage, and storageClassName
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
name: <PV_NAME>
spec:
capacity:
storage: <STORAGE_SIZE> # e.g., 10Gi — must be >= PVC request
accessModes:
- <ACCESS_MODE> # e.g., ReadWriteOnce
storageClassName: <STORAGECLASS_NAME>
persistentVolumeReclaimPolicy: Retain
# Add your volume source here, e.g. for NFS:
nfs:
path: <NFS_PATH>
server: <NFS_SERVER_IP>
EOF
kubectl get pvc <PVC_NAME> -n <NAMESPACE>
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS
my-pvc Bound my-pv 10Gi RWO fast-ssd
selector or volumeName fields in the PVC spec that further restrict binding.
Verification¶
# Confirm the issue is resolved
kubectl get pvc -n <NAMESPACE>
kubectl get pods -n <NAMESPACE> | grep <APP_NAME>
Bound status, and the pod using the PVC transitions from Pending to Running.
If still broken: Escalate — see below.
Escalation¶
| Condition | Who to Page | What to Say |
|---|---|---|
| Not resolved in 30 min | SRE on-call | "Kubernetes PVC stuck Pending in |
| Data loss suspected | Platform Lead | "Data loss risk: PVC |
| Scope expanding beyond namespace | Platform team | "Multi-namespace impact: storage provisioner down, all dynamic PVCs failing cluster-wide" |
Post-Incident¶
- Update monitoring if alert was noisy or missing
- File postmortem if P1/P2
- Update this runbook if steps were wrong or incomplete
- Confirm the StorageClass name in the application manifest is correct in git
- Check if cluster has zone-aware nodes to match topology requirements
Common Mistakes¶
- Wrong StorageClass name: This is the most common cause. The StorageClass name in the PVC must exactly match an existing StorageClass object — Kubernetes does not fuzzy-match or fall back to default automatically (unless the StorageClass is literally named with the
storageclass.kubernetes.io/is-default-class: "true"annotation). Always runkubectl get storageclassto confirm the exact name before deploying. - Zone affinity mismatch with the pod: When using
WaitForFirstConsumerbinding mode, the PVC cannot bind until the pod is scheduled. If the pod has a node affinity constraint pinning it tous-east-1abut no nodes or EBS volumes exist in that zone, both the pod and the PVC will be stuck indefinitely. The fix is usually to align the pod's zone affinity with available nodes — not to fight the storage topology. - Deleting a Bound PVC thinking it is safe: If a PVC has ever been Bound and you delete it, the reclaim policy on the PV determines whether the underlying data is deleted (
Delete) or preserved (Retain). Always check the reclaim policy before deleting a PVC. Even a PVC that is currently Pending may have a PV from a previous binding that contains data.
Prevention¶
- Always set a default StorageClass
- Use
volumeBindingMode: WaitForFirstConsumerto avoid zone issues - Monitor CSI driver health
- Set cloud resource quotas with headroom
- Test PVC creation in CI/staging before production
Cross-References¶
- Survival Guide: On-Call Survival Guide (pocket card version)
- Topic Pack: Kubernetes Topics (deep background)
- Related Runbook: pod-crashloop.md — if pod crashes after PVC binds
- Related Runbook: deploy-stuck.md — if deployment stalls because pods are waiting on PVC
- Related Runbook: node-not-ready.md — if node issues prevent volume attachment
Wiki Navigation¶
Related Content¶
- Case Study: Persistent Volume Stuck Terminating (Case Study, L2) — Kubernetes Storage
- Database Operations on Kubernetes (Topic Pack, L2) — Kubernetes Storage
- K8s Storage (Topic Pack, L1) — Kubernetes Storage
- Kubernetes Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — Kubernetes Storage
- Kubernetes Storage Flashcards (CLI) (flashcard_deck, L1) — Kubernetes Storage
- Track: Kubernetes Core (Reference, L1) — Kubernetes Storage