k8s
l1
topic-pack
k8s-storage --- Portal | Level: L1: Foundations | Topics: Kubernetes Storage | Domain: Kubernetes

Kubernetes Storage - Primer¶

Why This Matters¶

Stateless pods are easy. The moment your workload needs to persist data — databases, message queues, file uploads, ML model checkpoints — you enter Kubernetes storage. Get it wrong and you face data loss, stuck deployments, or pods that refuse to schedule. Storage is where Kubernetes stops being abstract and starts touching real disks, real drivers, and real failure modes.

Core Concepts¶

PersistentVolumes (PV)¶

A PersistentVolume is a cluster-level resource representing a piece of storage. It exists independently of any pod. Think of it as the "disk" that the cluster knows about:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-fast-01
spec:
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: fast-ssd
  csi:
    driver: ebs.csi.aws.com
    volumeHandle: vol-0abc123def456

PVs can be provisioned statically (admin creates them ahead of time) or dynamically (created on demand via StorageClasses).

PersistentVolumeClaims (PVC)¶

A PVC is a namespace-scoped request for storage. Pods never reference PVs directly — they reference PVCs, and Kubernetes binds PVCs to suitable PVs:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-postgres-0
  namespace: production
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 50Gi

The binding is based on access mode, storage class, capacity, and label selectors. Once bound, the relationship is exclusive — no other PVC can claim that PV.

StorageClasses¶

A StorageClass defines how storage is provisioned. It names a provisioner, sets parameters, and configures reclaim behavior:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "5000"
  throughput: "250"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Field	Purpose
`provisioner`	Which CSI driver or in-tree plugin creates volumes
`parameters`	Provider-specific knobs (disk type, IOPS, encryption)
`reclaimPolicy`	What happens to the PV when PVC is deleted
`volumeBindingMode`	`Immediate` or `WaitForFirstConsumer`
`allowVolumeExpansion`	Whether PVCs can request more space after creation

Dynamic Provisioning¶

With a StorageClass in place, you never create PVs manually. The flow:

Pod references a PVC
PVC references a StorageClass
Kubernetes calls the CSI driver to create a volume
A PV is automatically created and bound to the PVC
The volume is mounted into the pod

WaitForFirstConsumer delays provisioning until a pod actually needs the volume. This ensures the volume is created in the same availability zone as the node — critical for cloud providers where disks are zone-local.

Access Modes¶

Mode	Abbreviation	Description
ReadWriteOnce	RWO	Mounted read-write by a single node
ReadOnlyMany	ROX	Mounted read-only by many nodes
ReadWriteMany	RWX	Mounted read-write by many nodes
ReadWriteOncePod	RWOP	Mounted read-write by a single pod (K8s 1.27+)

RWO is the most common — block storage (EBS, Azure Disk, GCE PD) is inherently single-attach. RWX requires a shared filesystem (NFS, EFS, CephFS, Azure Files). ROX is useful for distributing config or reference data to many pods.

Common mistake: assuming RWO means single-pod. RWO means single-node — multiple pods on the same node can all mount an RWO volume.

Reclaim Policies¶

Policy	Behavior
`Retain`	PV persists after PVC deletion; admin must manually reclaim
`Delete`	PV and underlying storage are deleted when PVC is deleted
`Recycle`	Deprecated. Was `rm -rf /thevolume/*` and made PV available again

For production databases, use Retain. For ephemeral workloads, Delete is fine. Never rely on Recycle.

Under the hood: The Container Storage Interface (CSI) specification was created in December 2017 by engineers from Google, Red Hat, VMware, and others to decouple storage from the Kubernetes codebase. Before CSI, storage vendors had to submit code directly to the Kubernetes repository (called "in-tree" plugins), which slowed both Kubernetes releases and vendor iteration. CSI moved storage drivers "out-of-tree" — vendors ship their own container images that implement a gRPC interface. CSI reached GA in Kubernetes 1.13 (December 2018). The old in-tree plugins (like kubernetes.io/aws-ebs) are being migrated to CSI drivers and will eventually be removed.

CSI Drivers¶

The Container Storage Interface (CSI) is the standard for connecting storage systems to Kubernetes. Each cloud or storage vendor ships a CSI driver:

Provider	Driver
AWS EBS	`ebs.csi.aws.com`
AWS EFS	`efs.csi.aws.com`
GCE PD	`pd.csi.storage.gke.io`
Azure Disk	`disk.csi.azure.com`
Azure Files	`file.csi.azure.com`
Ceph RBD	`rbd.csi.ceph.com`
NFS	`nfs.csi.k8s.io`

CSI drivers run as DaemonSets (node plugin) and Deployments (controller). Check driver health:

kubectl get pods -n kube-system -l app=ebs-csi-controller
kubectl get csinodes
kubectl get csidrivers

Volume Snapshots¶

Volume snapshots allow point-in-time copies of PVCs, useful for backups and cloning:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: pg-data-snap-20260315
spec:
  volumeSnapshotClassName: csi-aws-snapclass
  source:
    persistentVolumeClaimName: data-postgres-0

Restore from a snapshot by referencing it in a new PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-postgres-restored
spec:
  dataSource:
    name: pg-data-snap-20260315
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 50Gi

Volume snapshots require a CSI driver that supports them and a VolumeSnapshotClass.

StatefulSet Storage Patterns¶

StatefulSets use volumeClaimTemplates to create one PVC per replica. Each PVC follows the naming pattern <template-name>-<statefulset-name>-<ordinal>:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  replicas: 3
  serviceName: postgres
  template:
    spec:
      containers:
        - name: postgres
          volumeMounts:
            - name: pgdata
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: pgdata
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 100Gi

This creates pgdata-postgres-0, pgdata-postgres-1, pgdata-postgres-2. When a StatefulSet pod is rescheduled, it reattaches to its existing PVC — this is how databases survive pod restarts.

Key behaviors: - Scaling down does NOT delete PVCs (data is preserved) - Deleting the StatefulSet does NOT delete PVCs - PVCs must be deleted manually to reclaim storage - Scaling back up reattaches existing PVCs by ordinal

Gotcha: WaitForFirstConsumer is almost always the right choice, but it causes confusion during debugging. A PVC in WaitForFirstConsumer mode will stay in Pending state indefinitely until a pod actually references it and gets scheduled. This is normal behavior, not an error. The PVC events will say "waiting for first consumer to be created before binding." If you see this and there is no pod referencing the PVC, it is working as designed — the volume will not be provisioned until a pod needs it.

War story: A team deleted a StatefulSet and assumed the PVCs would be cleaned up automatically. They were not — Kubernetes deliberately preserves StatefulSet PVCs to prevent data loss. Three months later, they had 200 orphaned 100Gi PVCs costing $2,000/month on AWS EBS. The fix: audit PVCs with kubectl get pvc -A --sort-by=.metadata.creationTimestamp and delete orphans. The prevention: tag PVCs with ownership labels and run periodic cleanup scripts.

Debugging Storage Issues¶

PVC Stuck in Pending¶

This is the most common storage problem. Check the PVC events:

kubectl describe pvc <name> -n <namespace>

Common causes:

Symptom in Events	Cause	Fix
`no persistent volumes available`	No matching PV exists and no StorageClass can provision one	Check StorageClass exists and provisioner is healthy
`waiting for first consumer`	`WaitForFirstConsumer` mode — normal until a pod is scheduled	Schedule a pod that references this PVC
`storageclass "xxx" not found`	PVC references a nonexistent StorageClass	Create the StorageClass or fix the name
`exceeded quota`	Namespace ResourceQuota limits storage	Increase quota or reduce request
`volume capacity insufficient`	Requested size exceeds what the provisioner can allocate	Reduce size or check provider limits

Mount Errors¶

kubectl describe pod <name> -n <namespace>
# Look for "Unable to attach" or "MountVolume" warnings

kubectl get events -n <namespace> --field-selector reason=FailedMount

Common mount failures: - Multi-attach error: RWO volume still attached to another node (node drain did not complete) - Wrong filesystem: volume formatted as ext4 but pod expects xfs - Permission denied: SecurityContext UID does not match filesystem ownership; use fsGroup in the pod spec - Volume not found: underlying cloud disk was deleted outside Kubernetes

Capacity Issues¶

# Check PV/PVC utilization
kubectl get pv --sort-by=.spec.capacity.storage
kubectl get pvc -A --sort-by=.spec.resources.requests.storage

# Check node disk pressure
kubectl describe node <name> | grep -A5 Conditions

# Expand a PVC (if StorageClass allows it)
kubectl patch pvc data-postgres-0 -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'

PVC expansion is online for most CSI drivers but may require pod restart for filesystem resize. Check allowVolumeExpansion: true on the StorageClass.

Best Practices¶

Always use StorageClasses — avoid static PV provisioning in production
Use WaitForFirstConsumer — prevents zone mismatch on cloud providers
Set Retain for databases — Delete is fine for caches and temp data
Size PVCs generously — expansion is possible but adds operational risk
Use fsGroup in pod security context — ensures the container process owns mounted files
Snapshot before upgrades — take VolumeSnapshots before database version changes
Monitor PVC usage — alert on disk usage > 80% before pods hit ENOSPC
Label your PVs and PVCs — makes cleanup and auditing manageable at scale
Test failover — drain a node and verify pods reattach to storage on another node
Document your StorageClasses — teams should know which class to use for which workload

Case Study: Persistent Volume Stuck Terminating (Case Study, L2) — Kubernetes Storage
Database Operations on Kubernetes (Topic Pack, L2) — Kubernetes Storage
Kubernetes Exercises (Quest Ladder) (CLI) (Exercise Set, L1) — Kubernetes Storage
Kubernetes Storage Flashcards (CLI) (flashcard_deck, L1) — Kubernetes Storage
Runbook: PVC Stuck in Pending (Runbook, L1) — Kubernetes Storage
Track: Kubernetes Core (Reference, L1) — Kubernetes Storage

Kubernetes Storage - Primer¶

Why This Matters¶

Core Concepts¶

PersistentVolumes (PV)¶

PersistentVolumeClaims (PVC)¶

StorageClasses¶

Dynamic Provisioning¶

Access Modes¶

Reclaim Policies¶

CSI Drivers¶

Volume Snapshots¶

StatefulSet Storage Patterns¶

Debugging Storage Issues¶

PVC Stuck in Pending¶

Mount Errors¶

Capacity Issues¶

Best Practices¶

Wiki Navigation¶

Pages that link here¶

Kubernetes Storage - Primer¶

Why This Matters¶

Core Concepts¶

PersistentVolumes (PV)¶

PersistentVolumeClaims (PVC)¶

StorageClasses¶

Dynamic Provisioning¶

Access Modes¶

Reclaim Policies¶

CSI Drivers¶

Volume Snapshots¶

StatefulSet Storage Patterns¶

Debugging Storage Issues¶

PVC Stuck in Pending¶

Mount Errors¶

Capacity Issues¶

Best Practices¶

Wiki Navigation¶

Related Content¶

Pages that link here¶