Skip to content

Runbook: PVC Stuck in Pending

Field Value
Domain Kubernetes
Alert kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1 for >5 min
Severity P2
Est. Resolution Time 15-25 minutes
Escalation Timeout 30 minutes — page if not resolved
Last Tested 2026-03-19
Prerequisites kubectl access, cluster-admin or namespace-admin, kubeconfig configured

Quick Assessment (30 seconds)

# Run this first — it tells you the scope of the problem
kubectl get pvc -n <NAMESPACE>
If output shows: multiple PVCs Pending across different namespaces → The storage provisioner may be down; check the provisioner pod in kube-system before proceeding If output shows: a single PVC Pending → Continue with steps below

Step 1: Describe the PVC for Events

Why: The PVC events section is almost always the fastest path to the root cause — it shows provisioner messages, StorageClass errors, and topology failures.

kubectl describe pvc <PVC_NAME> -n <NAMESPACE>
Expected output — look for the Events section:
Events:
  Warning  ProvisioningFailed  90s   persistentvolume-controller
    storageclass.storage.k8s.io "fast-ssd" not found
Common event messages and what they mean: - storageclass ... not found → Wrong StorageClass name — go to Step 2 - no persistent volumes available for this claim → No matching PV — go to Step 3 - waiting for a volume to be created → Provisioner is working, wait longer or check provisioner — go to Step 4 - volume node affinity conflict → Zone/topology mismatch — go to Step 5 If no events appear: The provisioner may not be running at all — go to Step 4.

Step 2: Check StorageClass Exists

Why: A typo in the StorageClass name or a deleted StorageClass is a very common cause. Dynamic provisioning silently fails if the StorageClass does not exist.

# List all available StorageClasses
kubectl get storageclass

# Check the StorageClass name in the PVC spec
kubectl get pvc <PVC_NAME> -n <NAMESPACE> -o jsonpath='{.spec.storageClassName}'
Expected output:
NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE
standard (default)   kubernetes.io/gce-pd    Delete          Immediate
fast-ssd             kubernetes.io/gce-pd    Retain          WaitForFirstConsumer
If the StorageClass from the PVC does not appear in the list:
# Option A: Patch the PVC to use the correct StorageClass
# NOTE: StorageClassName is immutable on an existing PVC — delete and recreate
kubectl delete pvc <PVC_NAME> -n <NAMESPACE>
# Recreate with the correct storageClassName field in the manifest
kubectl apply -f <FIXED_PVC_MANIFEST>
WARNING: Deleting a PVC that is bound to a PV may delete the PV depending on the reclaim policy. Confirm the PVC is still Pending (not Bound) before deleting. If this fails: The required StorageClass may need to be created by the platform team — do not create StorageClass objects without platform team approval.

Step 3: Check If a Matching PV Is Available

Why: In clusters that use static provisioning (pre-created PVs), the PVC can only bind to a PV that matches its access mode, storage size, and StorageClass. If no matching PV exists, the PVC stays Pending.

# List available (unbound) PVs
kubectl get pv | grep Available

# Get detailed PV spec to check access modes and capacity
kubectl get pvc <PVC_NAME> -n <NAMESPACE> -o yaml | grep -E "accessModes|storage|storageClassName"
kubectl get pv -o yaml | grep -E "accessModes|storage|storageClassName|status.phase"
Expected output (PVC requirements):
accessModes:
- ReadWriteOnce
storage: 10Gi
storageClassName: fast-ssd
If no Available PV matches: The PV must be created manually. See Step 6 for manual PV provisioning. If matching PV exists but is not binding: Check if the PV has a claimRef pointing to a different PVC — it is reserved and cannot be reused without clearing the claimRef.

Step 4: Check Storage Provisioner Logs

Why: The dynamic provisioner (e.g., aws-ebs-csi-driver, gce-pd-csi-driver, ceph-csi) is responsible for creating the underlying volume. If it is crashed or misconfigured, all dynamic PVCs in the cluster will be stuck.

# Find the provisioner pod — it is usually in kube-system
kubectl get pods -n kube-system | grep -E "csi|provisioner|storage"

# Check provisioner logs
kubectl logs -n kube-system <PROVISIONER_POD_NAME> --tail=50

# Check if the provisioner has the necessary permissions
kubectl get clusterrolebinding | grep provisioner
Expected output (healthy provisioner):
I0319 10:00:00.000000 1 controller.go:1234] provisioning volume for claim "default/my-pvc"
I0319 10:00:05.000000 1 controller.go:1234] successfully provisioned volume pvc-abc123
Expected output (provisioner error):
E0319 10:00:00.000000 1 controller.go:500] error provisioning volume: InvalidParameterValue: ...
If provisioner is crashing: Restart it with kubectl rollout restart deployment/<PROVISIONER_DEPLOYMENT> -n kube-system and check for IAM/permission errors. If this fails: Escalate to the platform team — the provisioner may need IAM role or cloud credentials reconfiguration.

Step 5: Check Node Affinity and Topology Constraints

Why: When volumeBindingMode: WaitForFirstConsumer is set on the StorageClass (which is correct for zone-aware provisioning), the PVC only binds once a pod is scheduled to use it. If the pod cannot be scheduled, the PVC stays Pending indefinitely.

# Check the StorageClass binding mode
kubectl get storageclass <STORAGECLASS_NAME> -o jsonpath='{.volumeBindingMode}'

# Check the pod that should be using this PVC
kubectl get pods -n <NAMESPACE> -o wide | grep <APP_NAME>
kubectl describe pod <POD_NAME> -n <NAMESPACE> | grep -A 10 "Events:"

# If pod is also pending, check its node affinity
kubectl get pod <POD_NAME> -n <NAMESPACE> -o yaml | grep -A 20 "affinity:"
Expected output (zone mismatch):
Events:
  Warning  FailedScheduling  2m  default-scheduler
    0/3 nodes are available: 3 node(s) had volume node affinity conflict.
If topology constraint is the problem: Either remove the node affinity from the pod, or ensure nodes exist in the required zone:
# Check which zones your nodes are in
kubectl get nodes -o custom-columns='NAME:.metadata.name,ZONE:.metadata.labels.topology\.kubernetes\.io/zone'
If this fails: A multi-zone cluster may need a node added to the required zone — escalate to the platform team.

Step 6: Manually Provision a PV If Needed

Why: In some environments, dynamic provisioning is unavailable (air-gapped clusters, strict quotas) and a PV must be created manually to match the PVC.

# Check the exact PVC requirements
kubectl get pvc <PVC_NAME> -n <NAMESPACE> -o yaml

# Create a matching PV — edit the spec to match the PVC's accessModes, storage, and storageClassName
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
  name: <PV_NAME>
spec:
  capacity:
    storage: <STORAGE_SIZE>        # e.g., 10Gi — must be >= PVC request
  accessModes:
    - <ACCESS_MODE>                # e.g., ReadWriteOnce
  storageClassName: <STORAGECLASS_NAME>
  persistentVolumeReclaimPolicy: Retain
  # Add your volume source here, e.g. for NFS:
  nfs:
    path: <NFS_PATH>
    server: <NFS_SERVER_IP>
EOF
Expected output (PVC binds after PV creation):
kubectl get pvc <PVC_NAME> -n <NAMESPACE>
NAME      STATUS   VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS
my-pvc    Bound    my-pv     10Gi       RWO            fast-ssd
If this fails: The PV spec may not satisfy the PVC selector — check for selector or volumeName fields in the PVC spec that further restrict binding.

Verification

# Confirm the issue is resolved
kubectl get pvc -n <NAMESPACE>
kubectl get pods -n <NAMESPACE> | grep <APP_NAME>
Success looks like: PVC shows Bound status, and the pod using the PVC transitions from Pending to Running. If still broken: Escalate — see below.

Escalation

Condition Who to Page What to Say
Not resolved in 30 min SRE on-call "Kubernetes PVC stuck Pending in , PVC , provisioner unresponsive, runbook exhausted"
Data loss suspected Platform Lead "Data loss risk: PVC bound to existing PV with data, do not delete"
Scope expanding beyond namespace Platform team "Multi-namespace impact: storage provisioner down, all dynamic PVCs failing cluster-wide"

Post-Incident

  • Update monitoring if alert was noisy or missing
  • File postmortem if P1/P2
  • Update this runbook if steps were wrong or incomplete
  • Confirm the StorageClass name in the application manifest is correct in git
  • Check if cluster has zone-aware nodes to match topology requirements

Common Mistakes

  1. Wrong StorageClass name: This is the most common cause. The StorageClass name in the PVC must exactly match an existing StorageClass object — Kubernetes does not fuzzy-match or fall back to default automatically (unless the StorageClass is literally named with the storageclass.kubernetes.io/is-default-class: "true" annotation). Always run kubectl get storageclass to confirm the exact name before deploying.
  2. Zone affinity mismatch with the pod: When using WaitForFirstConsumer binding mode, the PVC cannot bind until the pod is scheduled. If the pod has a node affinity constraint pinning it to us-east-1a but no nodes or EBS volumes exist in that zone, both the pod and the PVC will be stuck indefinitely. The fix is usually to align the pod's zone affinity with available nodes — not to fight the storage topology.
  3. Deleting a Bound PVC thinking it is safe: If a PVC has ever been Bound and you delete it, the reclaim policy on the PV determines whether the underlying data is deleted (Delete) or preserved (Retain). Always check the reclaim policy before deleting a PVC. Even a PVC that is currently Pending may have a PV from a previous binding that contains data.

Prevention

  • Always set a default StorageClass
  • Use volumeBindingMode: WaitForFirstConsumer to avoid zone issues
  • Monitor CSI driver health
  • Set cloud resource quotas with headroom
  • Test PVC creation in CI/staging before production

Cross-References


Wiki Navigation