Pattern: Disk Full (Reserved Blocks Gone)¶

ID: FP-003 Family: Resource Exhaustion Frequency: Common Blast Radius: Single Service to Multi-Service Detection Difficulty: Moderate

The Shape¶

ext4 and other Linux filesystems reserve 5% of blocks for the root user by default. This means the filesystem appears full to non-root processes at 95% utilization, not 100%. The gap between "disk full alert at 90%" and "actual failure at 95%" is often smaller than expected, and runaway log files or growing data can consume that buffer overnight.

How You'll See It¶

In Linux/Infrastructure¶

$ df -h /var/log
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   47G     0  100% /var/log

$ tune2fs -l /dev/sda1 | grep "Reserved block"
Reserved block count:      131072    # 5% of 50GB = 2.5GB reserved for root

Services running as non-root (nginx, postgres, app processes) cannot write new files. Root can still write (the reserved blocks), so sudo touch /var/log/test works while touch /var/log/test (as the app user) fails. This asymmetry causes confusing debugging.

In Kubernetes¶

Node ephemeral storage fills from pod logs. Kubelet triggers eviction when available (non-reserved) space drops to the imagefs.available threshold. Pods are evicted even though df shows the node isn't at 100% — the reserved block buffer is the gap.

In CI/CD¶

Build artifacts accumulate in the agent workspace. The disk appears to have 5% free but all writes fail because the 5% is the reserved root block allocation, not actually available to the build process running as ci-user.

The Tell¶

Non-root processes get ENOSPC but root can still write. df -h shows 100% or the filesystem-reported "Avail" is 0, even though the filesystem isn't physically at 100% capacity (reserved blocks account for the gap).

Common Misdiagnosis¶

Looks Like	But Actually	How to Tell the Difference
Inode exhaustion	Block exhaustion	`df -i` shows inodes OK; `df -h` shows 100%
Permissions error	Disk full	`strace` shows `ENOSPC`; root user write succeeds
Filesystem corruption	Block limit	`fsck` passes clean; `df -h` explains the failure

The Fix (Generic)¶

Immediate: Delete or truncate large files (logs, core dumps, temp files). For logs of running processes, use truncate -s 0 /var/log/app.log rather than rm (avoid FP-029).
Short-term: Tune reserved block percentage: tune2fs -m 1 /dev/sda1 (reduce to 1% for non-root filesystems). Implement log rotation with maxsize limits.
Long-term: Separate /var/log onto its own filesystem to prevent log fills from affecting the root filesystem; add alerting at 70% and 85% to catch growth before it hits the ceiling.

Real-World Examples¶

Example 1: Postgres WAL files grew faster than archiving could remove them. At 95%, postgres (non-root) could no longer create new WAL segments. Database went read-only. Root still had the reserved 5%.
Example 2: Docker image layers accumulated on a build node. At 95% df showed "100% used, 0 avail" for non-root builds, while root docker pull still worked.

War Story¶

Alert fired: "disk at 90%". We said we'd clean it up "in the morning." Overnight the WAL archiver fell behind and within 4 hours the filesystem hit 95%. Postgres stopped accepting writes. We spent 30 minutes confused because root could still write fine — even creating test files in /var/lib/postgresql/. The service account didn't have root's reserved blocks. Lesson: 90% is "clean it now," not "clean it tomorrow."

Cross-References¶

Topic Packs: disk-and-storage-ops, linux-ops
Case Studies: datacenter_ops/disk-full-root-services-down/, linux_ops/runaway-logs-fill-disk/
Footguns: disk-and-storage-ops/footguns.md
Related Patterns: FP-001 (inode exhaustion — same symptom, different cause), FP-029 (deleted-open-file — another disk-space paradox)