k8s
l1
runbook
k8s-storage --- Portal | Level: L1: Foundations | Topics: Kubernetes Storage | Domain: Kubernetes

Runbook: PVC Stuck in Pending¶

Field	Value
Domain	Kubernetes
Alert	`kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1` for >5 min
Severity	P2
Est. Resolution Time	15-25 minutes
Escalation Timeout	30 minutes — page if not resolved
Last Tested	2026-03-19
Prerequisites	kubectl access, cluster-admin or namespace-admin, kubeconfig configured

Quick Assessment (30 seconds)¶

# Run this first — it tells you the scope of the problem
kubectl get pvc -n <NAMESPACE>

If output shows: multiple PVCs Pending across different namespaces → The storage provisioner may be down; check the provisioner pod in kube-system before proceeding If output shows: a single PVC Pending → Continue with steps below

Step 1: Describe the PVC for Events¶

Why: The PVC events section is almost always the fastest path to the root cause — it shows provisioner messages, StorageClass errors, and topology failures.

kubectl describe pvc <PVC_NAME> -n <NAMESPACE>

Expected output — look for the Events section:

Events:
  Warning  ProvisioningFailed  90s   persistentvolume-controller
    storageclass.storage.k8s.io "fast-ssd" not found

Common event messages and what they mean: - storageclass ... not found → Wrong StorageClass name — go to Step 2 - no persistent volumes available for this claim → No matching PV — go to Step 3 - waiting for a volume to be created → Provisioner is working, wait longer or check provisioner — go to Step 4 - volume node affinity conflict → Zone/topology mismatch — go to Step 5 If no events appear: The provisioner may not be running at all — go to Step 4.

Step 2: Check StorageClass Exists¶

Why: A typo in the StorageClass name or a deleted StorageClass is a very common cause. Dynamic provisioning silently fails if the StorageClass does not exist.

# List all available StorageClasses
kubectl get storageclass

# Check the StorageClass name in the PVC spec
kubectl get pvc <PVC_NAME> -n <NAMESPACE> -o jsonpath='{.spec.storageClassName}'

Expected output:

NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE
standard (default)   kubernetes.io/gce-pd    Delete          Immediate
fast-ssd             kubernetes.io/gce-pd    Retain          WaitForFirstConsumer

If the StorageClass from the PVC does not appear in the list:

# Option A: Patch the PVC to use the correct StorageClass
# NOTE: StorageClassName is immutable on an existing PVC — delete and recreate
kubectl delete pvc <PVC_NAME> -n <NAMESPACE>
# Recreate with the correct storageClassName field in the manifest
kubectl apply -f <FIXED_PVC_MANIFEST>

WARNING: Deleting a PVC that is bound to a PV may delete the PV depending on the reclaim policy. Confirm the PVC is still Pending (not Bound) before deleting. If this fails: The required StorageClass may need to be created by the platform team — do not create StorageClass objects without platform team approval.

Step 3: Check If a Matching PV Is Available¶

Why: In clusters that use static provisioning (pre-created PVs), the PVC can only bind to a PV that matches its access mode, storage size, and StorageClass. If no matching PV exists, the PVC stays Pending.

# List available (unbound) PVs
kubectl get pv | grep Available

# Get detailed PV spec to check access modes and capacity
kubectl get pvc <PVC_NAME> -n <NAMESPACE> -o yaml | grep -E "accessModes|storage|storageClassName"
kubectl get pv -o yaml | grep -E "accessModes|storage|storageClassName|status.phase"

Expected output (PVC requirements):

accessModes:
- ReadWriteOnce
storage: 10Gi
storageClassName: fast-ssd

If no Available PV matches: The PV must be created manually. See Step 6 for manual PV provisioning. If matching PV exists but is not binding: Check if the PV has a claimRef pointing to a different PVC — it is reserved and cannot be reused without clearing the claimRef.

Step 4: Check Storage Provisioner Logs¶

Why: The dynamic provisioner (e.g., aws-ebs-csi-driver, gce-pd-csi-driver, ceph-csi) is responsible for creating the underlying volume. If it is crashed or misconfigured, all dynamic PVCs in the cluster will be stuck.

# Find the provisioner pod — it is usually in kube-system
kubectl get pods -n kube-system | grep -E "csi|provisioner|storage"

# Check provisioner logs
kubectl logs -n kube-system <PROVISIONER_POD_NAME> --tail=50

# Check if the provisioner has the necessary permissions
kubectl get clusterrolebinding | grep provisioner

Expected output (healthy provisioner):

I0319 10:00:00.000000 1 controller.go:1234] provisioning volume for claim "default/my-pvc"
I0319 10:00:05.000000 1 controller.go:1234] successfully provisioned volume pvc-abc123

Expected output (provisioner error):

E0319 10:00:00.000000 1 controller.go:500] error provisioning volume: InvalidParameterValue: ...

If provisioner is crashing: Restart it with kubectl rollout restart deployment/<PROVISIONER_DEPLOYMENT> -n kube-system and check for IAM/permission errors. If this fails: Escalate to the platform team — the provisioner may need IAM role or cloud credentials reconfiguration.

Step 5: Check Node Affinity and Topology Constraints¶

Why: When volumeBindingMode: WaitForFirstConsumer is set on the StorageClass (which is correct for zone-aware provisioning), the PVC only binds once a pod is scheduled to use it. If the pod cannot be scheduled, the PVC stays Pending indefinitely.

# Check the StorageClass binding mode
kubectl get storageclass <STORAGECLASS_NAME> -o jsonpath='{.volumeBindingMode}'

# Check the pod that should be using this PVC
kubectl get pods -n <NAMESPACE> -o wide | grep <APP_NAME>
kubectl describe pod <POD_NAME> -n <NAMESPACE> | grep -A 10 "Events:"

# If pod is also pending, check its node affinity
kubectl get pod <POD_NAME> -n <NAMESPACE> -o yaml | grep -A 20 "affinity:"

Expected output (zone mismatch):

Events:
  Warning  FailedScheduling  2m  default-scheduler
    0/3 nodes are available: 3 node(s) had volume node affinity conflict.

If topology constraint is the problem: Either remove the node affinity from the pod, or ensure nodes exist in the required zone:

# Check which zones your nodes are in
kubectl get nodes -o custom-columns='NAME:.metadata.name,ZONE:.metadata.labels.topology\.kubernetes\.io/zone'

If this fails: A multi-zone cluster may need a node added to the required zone — escalate to the platform team.

Step 6: Manually Provision a PV If Needed¶

Why: In some environments, dynamic provisioning is unavailable (air-gapped clusters, strict quotas) and a PV must be created manually to match the PVC.

# Check the exact PVC requirements
kubectl get pvc <PVC_NAME> -n <NAMESPACE> -o yaml

# Create a matching PV — edit the spec to match the PVC's accessModes, storage, and storageClassName
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
  name: <PV_NAME>
spec:
  capacity:
    storage: <STORAGE_SIZE>        # e.g., 10Gi — must be >= PVC request
  accessModes:
    - <ACCESS_MODE>                # e.g., ReadWriteOnce
  storageClassName: <STORAGECLASS_NAME>
  persistentVolumeReclaimPolicy: Retain
  # Add your volume source here, e.g. for NFS:
  nfs:
    path: <NFS_PATH>
    server: <NFS_SERVER_IP>
EOF

Expected output (PVC binds after PV creation):

kubectl get pvc <PVC_NAME> -n <NAMESPACE>
NAME      STATUS   VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS
my-pvc    Bound    my-pv     10Gi       RWO            fast-ssd

If this fails: The PV spec may not satisfy the PVC selector — check for selector or volumeName fields in the PVC spec that further restrict binding.

Verification¶

# Confirm the issue is resolved
kubectl get pvc -n <NAMESPACE>
kubectl get pods -n <NAMESPACE> | grep <APP_NAME>

Success looks like: PVC shows Bound status, and the pod using the PVC transitions from Pending to Running. If still broken: Escalate — see below.

Escalation¶

Condition	Who to Page	What to Say
Not resolved in 30 min	SRE on-call	"Kubernetes PVC stuck Pending in , PVC , provisioner unresponsive, runbook exhausted"
Data loss suspected	Platform Lead	"Data loss risk: PVC bound to existing PV with data, do not delete"
Scope expanding beyond namespace	Platform team	"Multi-namespace impact: storage provisioner down, all dynamic PVCs failing cluster-wide"

Post-Incident¶

Update monitoring if alert was noisy or missing
File postmortem if P1/P2
Update this runbook if steps were wrong or incomplete
Confirm the StorageClass name in the application manifest is correct in git
Check if cluster has zone-aware nodes to match topology requirements

Common Mistakes¶

Wrong StorageClass name: This is the most common cause. The StorageClass name in the PVC must exactly match an existing StorageClass object — Kubernetes does not fuzzy-match or fall back to default automatically (unless the StorageClass is literally named with the storageclass.kubernetes.io/is-default-class: "true" annotation). Always run kubectl get storageclass to confirm the exact name before deploying.
Zone affinity mismatch with the pod: When using WaitForFirstConsumer binding mode, the PVC cannot bind until the pod is scheduled. If the pod has a node affinity constraint pinning it to us-east-1a but no nodes or EBS volumes exist in that zone, both the pod and the PVC will be stuck indefinitely. The fix is usually to align the pod's zone affinity with available nodes — not to fight the storage topology.
Deleting a Bound PVC thinking it is safe: If a PVC has ever been Bound and you delete it, the reclaim policy on the PV determines whether the underlying data is deleted (Delete) or preserved (Retain). Always check the reclaim policy before deleting a PVC. Even a PVC that is currently Pending may have a PV from a previous binding that contains data.

Prevention¶

Always set a default StorageClass
Use volumeBindingMode: WaitForFirstConsumer to avoid zone issues
Monitor CSI driver health
Set cloud resource quotas with headroom
Test PVC creation in CI/staging before production

Cross-References¶

Survival Guide: On-Call Survival Guide (pocket card version)
Topic Pack: Kubernetes Topics (deep background)
Related Runbook: pod-crashloop.md — if pod crashes after PVC binds
Related Runbook: deploy-stuck.md — if deployment stalls because pods are waiting on PVC
Related Runbook: node-not-ready.md — if node issues prevent volume attachment

Case Study: Persistent Volume Stuck Terminating (Case Study, L2) — Kubernetes Storage
Database Operations on Kubernetes (Topic Pack, L2) — Kubernetes Storage
K8s Storage (Topic Pack, L1) — Kubernetes Storage
Kubernetes Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — Kubernetes Storage
Kubernetes Storage Flashcards (CLI) (flashcard_deck, L1) — Kubernetes Storage
Track: Kubernetes Core (Reference, L1) — Kubernetes Storage

Runbook: PVC Stuck in Pending¶

Quick Assessment (30 seconds)¶

Step 1: Describe the PVC for Events¶

Step 2: Check StorageClass Exists¶

Step 3: Check If a Matching PV Is Available¶

Step 4: Check Storage Provisioner Logs¶

Step 5: Check Node Affinity and Topology Constraints¶

Step 6: Manually Provision a PV If Needed¶

Verification¶

Escalation¶

Post-Incident¶

Common Mistakes¶

Prevention¶

Cross-References¶

Wiki Navigation¶

Pages that link here¶

Runbook: PVC Stuck in Pending¶

Quick Assessment (30 seconds)¶

Step 1: Describe the PVC for Events¶

Step 2: Check StorageClass Exists¶

Step 3: Check If a Matching PV Is Available¶

Step 4: Check Storage Provisioner Logs¶

Step 5: Check Node Affinity and Topology Constraints¶

Step 6: Manually Provision a PV If Needed¶

Verification¶

Escalation¶

Post-Incident¶

Common Mistakes¶

Prevention¶

Cross-References¶

Wiki Navigation¶

Related Content¶

Pages that link here¶