- k8s
- l1
- topic-pack
- k8s-storage --- Portal | Level: L1: Foundations | Topics: Kubernetes Storage | Domain: Kubernetes
Kubernetes Storage - Primer¶
Why This Matters¶
Stateless pods are easy. The moment your workload needs to persist data — databases, message queues, file uploads, ML model checkpoints — you enter Kubernetes storage. Get it wrong and you face data loss, stuck deployments, or pods that refuse to schedule. Storage is where Kubernetes stops being abstract and starts touching real disks, real drivers, and real failure modes.
Core Concepts¶
PersistentVolumes (PV)¶
A PersistentVolume is a cluster-level resource representing a piece of storage. It exists independently of any pod. Think of it as the "disk" that the cluster knows about:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-fast-01
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: fast-ssd
csi:
driver: ebs.csi.aws.com
volumeHandle: vol-0abc123def456
PVs can be provisioned statically (admin creates them ahead of time) or dynamically (created on demand via StorageClasses).
PersistentVolumeClaims (PVC)¶
A PVC is a namespace-scoped request for storage. Pods never reference PVs directly — they reference PVCs, and Kubernetes binds PVCs to suitable PVs:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-postgres-0
namespace: production
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 50Gi
The binding is based on access mode, storage class, capacity, and label selectors. Once bound, the relationship is exclusive — no other PVC can claim that PV.
StorageClasses¶
A StorageClass defines how storage is provisioned. It names a provisioner, sets parameters, and configures reclaim behavior:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "5000"
throughput: "250"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
| Field | Purpose |
|---|---|
provisioner |
Which CSI driver or in-tree plugin creates volumes |
parameters |
Provider-specific knobs (disk type, IOPS, encryption) |
reclaimPolicy |
What happens to the PV when PVC is deleted |
volumeBindingMode |
Immediate or WaitForFirstConsumer |
allowVolumeExpansion |
Whether PVCs can request more space after creation |
Dynamic Provisioning¶
With a StorageClass in place, you never create PVs manually. The flow:
- Pod references a PVC
- PVC references a StorageClass
- Kubernetes calls the CSI driver to create a volume
- A PV is automatically created and bound to the PVC
- The volume is mounted into the pod
WaitForFirstConsumer delays provisioning until a pod actually needs the volume. This ensures the volume is created in the same availability zone as the node — critical for cloud providers where disks are zone-local.
Access Modes¶
| Mode | Abbreviation | Description |
|---|---|---|
| ReadWriteOnce | RWO | Mounted read-write by a single node |
| ReadOnlyMany | ROX | Mounted read-only by many nodes |
| ReadWriteMany | RWX | Mounted read-write by many nodes |
| ReadWriteOncePod | RWOP | Mounted read-write by a single pod (K8s 1.27+) |
RWO is the most common — block storage (EBS, Azure Disk, GCE PD) is inherently single-attach. RWX requires a shared filesystem (NFS, EFS, CephFS, Azure Files). ROX is useful for distributing config or reference data to many pods.
Common mistake: assuming RWO means single-pod. RWO means single-node — multiple pods on the same node can all mount an RWO volume.
Reclaim Policies¶
| Policy | Behavior |
|---|---|
Retain |
PV persists after PVC deletion; admin must manually reclaim |
Delete |
PV and underlying storage are deleted when PVC is deleted |
Recycle |
Deprecated. Was rm -rf /thevolume/* and made PV available again |
For production databases, use Retain. For ephemeral workloads, Delete is fine. Never rely on Recycle.
Under the hood: The Container Storage Interface (CSI) specification was created in December 2017 by engineers from Google, Red Hat, VMware, and others to decouple storage from the Kubernetes codebase. Before CSI, storage vendors had to submit code directly to the Kubernetes repository (called "in-tree" plugins), which slowed both Kubernetes releases and vendor iteration. CSI moved storage drivers "out-of-tree" — vendors ship their own container images that implement a gRPC interface. CSI reached GA in Kubernetes 1.13 (December 2018). The old in-tree plugins (like
kubernetes.io/aws-ebs) are being migrated to CSI drivers and will eventually be removed.
CSI Drivers¶
The Container Storage Interface (CSI) is the standard for connecting storage systems to Kubernetes. Each cloud or storage vendor ships a CSI driver:
| Provider | Driver |
|---|---|
| AWS EBS | ebs.csi.aws.com |
| AWS EFS | efs.csi.aws.com |
| GCE PD | pd.csi.storage.gke.io |
| Azure Disk | disk.csi.azure.com |
| Azure Files | file.csi.azure.com |
| Ceph RBD | rbd.csi.ceph.com |
| NFS | nfs.csi.k8s.io |
CSI drivers run as DaemonSets (node plugin) and Deployments (controller). Check driver health:
kubectl get pods -n kube-system -l app=ebs-csi-controller
kubectl get csinodes
kubectl get csidrivers
Volume Snapshots¶
Volume snapshots allow point-in-time copies of PVCs, useful for backups and cloning:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: pg-data-snap-20260315
spec:
volumeSnapshotClassName: csi-aws-snapclass
source:
persistentVolumeClaimName: data-postgres-0
Restore from a snapshot by referencing it in a new PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-postgres-restored
spec:
dataSource:
name: pg-data-snap-20260315
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
storageClassName: fast-ssd
resources:
requests:
storage: 50Gi
Volume snapshots require a CSI driver that supports them and a VolumeSnapshotClass.
StatefulSet Storage Patterns¶
StatefulSets use volumeClaimTemplates to create one PVC per replica. Each PVC follows the naming pattern <template-name>-<statefulset-name>-<ordinal>:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
replicas: 3
serviceName: postgres
template:
spec:
containers:
- name: postgres
volumeMounts:
- name: pgdata
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: pgdata
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
This creates pgdata-postgres-0, pgdata-postgres-1, pgdata-postgres-2. When a StatefulSet pod is rescheduled, it reattaches to its existing PVC — this is how databases survive pod restarts.
Key behaviors: - Scaling down does NOT delete PVCs (data is preserved) - Deleting the StatefulSet does NOT delete PVCs - PVCs must be deleted manually to reclaim storage - Scaling back up reattaches existing PVCs by ordinal
Gotcha:
WaitForFirstConsumeris almost always the right choice, but it causes confusion during debugging. A PVC inWaitForFirstConsumermode will stay inPendingstate indefinitely until a pod actually references it and gets scheduled. This is normal behavior, not an error. The PVC events will say "waiting for first consumer to be created before binding." If you see this and there is no pod referencing the PVC, it is working as designed — the volume will not be provisioned until a pod needs it.War story: A team deleted a StatefulSet and assumed the PVCs would be cleaned up automatically. They were not — Kubernetes deliberately preserves StatefulSet PVCs to prevent data loss. Three months later, they had 200 orphaned 100Gi PVCs costing $2,000/month on AWS EBS. The fix: audit PVCs with
kubectl get pvc -A --sort-by=.metadata.creationTimestampand delete orphans. The prevention: tag PVCs with ownership labels and run periodic cleanup scripts.
Debugging Storage Issues¶
PVC Stuck in Pending¶
This is the most common storage problem. Check the PVC events:
Common causes:
| Symptom in Events | Cause | Fix |
|---|---|---|
no persistent volumes available |
No matching PV exists and no StorageClass can provision one | Check StorageClass exists and provisioner is healthy |
waiting for first consumer |
WaitForFirstConsumer mode — normal until a pod is scheduled |
Schedule a pod that references this PVC |
storageclass "xxx" not found |
PVC references a nonexistent StorageClass | Create the StorageClass or fix the name |
exceeded quota |
Namespace ResourceQuota limits storage | Increase quota or reduce request |
volume capacity insufficient |
Requested size exceeds what the provisioner can allocate | Reduce size or check provider limits |
Mount Errors¶
kubectl describe pod <name> -n <namespace>
# Look for "Unable to attach" or "MountVolume" warnings
kubectl get events -n <namespace> --field-selector reason=FailedMount
Common mount failures:
- Multi-attach error: RWO volume still attached to another node (node drain did not complete)
- Wrong filesystem: volume formatted as ext4 but pod expects xfs
- Permission denied: SecurityContext UID does not match filesystem ownership; use fsGroup in the pod spec
- Volume not found: underlying cloud disk was deleted outside Kubernetes
Capacity Issues¶
# Check PV/PVC utilization
kubectl get pv --sort-by=.spec.capacity.storage
kubectl get pvc -A --sort-by=.spec.resources.requests.storage
# Check node disk pressure
kubectl describe node <name> | grep -A5 Conditions
# Expand a PVC (if StorageClass allows it)
kubectl patch pvc data-postgres-0 -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
PVC expansion is online for most CSI drivers but may require pod restart for filesystem resize. Check allowVolumeExpansion: true on the StorageClass.
Best Practices¶
- Always use StorageClasses — avoid static PV provisioning in production
- Use
WaitForFirstConsumer— prevents zone mismatch on cloud providers - Set
Retainfor databases —Deleteis fine for caches and temp data - Size PVCs generously — expansion is possible but adds operational risk
- Use
fsGroupin pod security context — ensures the container process owns mounted files - Snapshot before upgrades — take VolumeSnapshots before database version changes
- Monitor PVC usage — alert on disk usage > 80% before pods hit ENOSPC
- Label your PVs and PVCs — makes cleanup and auditing manageable at scale
- Test failover — drain a node and verify pods reattach to storage on another node
- Document your StorageClasses — teams should know which class to use for which workload
Wiki Navigation¶
Related Content¶
- Case Study: Persistent Volume Stuck Terminating (Case Study, L2) — Kubernetes Storage
- Database Operations on Kubernetes (Topic Pack, L2) — Kubernetes Storage
- Kubernetes Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — Kubernetes Storage
- Kubernetes Storage Flashcards (CLI) (flashcard_deck, L1) — Kubernetes Storage
- Runbook: PVC Stuck in Pending (Runbook, L1) — Kubernetes Storage
- Track: Kubernetes Core (Reference, L1) — Kubernetes Storage
Pages that link here¶
- Anti-Primer: Kubernetes Storage
- Certification Prep: CKA — Certified Kubernetes Administrator
- Certification Prep: CKAD — Certified Kubernetes Application Developer
- Incident Replay: Persistent Volume Stuck Terminating
- K8S Storage
- Kubernetes Ecosystem - Primer
- Kubernetes_Core
- Master Curriculum: 40 Weeks
- Production Readiness Review: Answer Key
- Production Readiness Review: Study Plans
- Runbook: PVC Stuck in Pending
- Symptoms
- Thinking Out Loud: Kubernetes Storage