Pattern: Disk Full (Reserved Blocks Gone)¶
ID: FP-003 Family: Resource Exhaustion Frequency: Common Blast Radius: Single Service to Multi-Service Detection Difficulty: Moderate
The Shape¶
ext4 and other Linux filesystems reserve 5% of blocks for the root user by default. This means the filesystem appears full to non-root processes at 95% utilization, not 100%. The gap between "disk full alert at 90%" and "actual failure at 95%" is often smaller than expected, and runaway log files or growing data can consume that buffer overnight.
How You'll See It¶
In Linux/Infrastructure¶
$ df -h /var/log
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 47G 0 100% /var/log
$ tune2fs -l /dev/sda1 | grep "Reserved block"
Reserved block count: 131072 # 5% of 50GB = 2.5GB reserved for root
sudo touch /var/log/test works while
touch /var/log/test (as the app user) fails. This asymmetry causes confusing debugging.
In Kubernetes¶
Node ephemeral storage fills from pod logs. Kubelet triggers eviction when available
(non-reserved) space drops to the imagefs.available threshold. Pods are evicted even
though df shows the node isn't at 100% — the reserved block buffer is the gap.
In CI/CD¶
Build artifacts accumulate in the agent workspace. The disk appears to have 5% free
but all writes fail because the 5% is the reserved root block allocation, not actually
available to the build process running as ci-user.
The Tell¶
Non-root processes get
ENOSPCbut root can still write.df -hshows 100% or the filesystem-reported "Avail" is 0, even though the filesystem isn't physically at 100% capacity (reserved blocks account for the gap).
Common Misdiagnosis¶
| Looks Like | But Actually | How to Tell the Difference |
|---|---|---|
| Inode exhaustion | Block exhaustion | df -i shows inodes OK; df -h shows 100% |
| Permissions error | Disk full | strace shows ENOSPC; root user write succeeds |
| Filesystem corruption | Block limit | fsck passes clean; df -h explains the failure |
The Fix (Generic)¶
- Immediate: Delete or truncate large files (logs, core dumps, temp files). For logs of running processes, use
truncate -s 0 /var/log/app.lograther thanrm(avoid FP-029). - Short-term: Tune reserved block percentage:
tune2fs -m 1 /dev/sda1(reduce to 1% for non-root filesystems). Implement log rotation withmaxsizelimits. - Long-term: Separate
/var/logonto its own filesystem to prevent log fills from affecting the root filesystem; add alerting at 70% and 85% to catch growth before it hits the ceiling.
Real-World Examples¶
- Example 1: Postgres WAL files grew faster than archiving could remove them. At 95%, postgres (non-root) could no longer create new WAL segments. Database went read-only. Root still had the reserved 5%.
- Example 2: Docker image layers accumulated on a build node. At 95%
dfshowed "100% used, 0 avail" for non-root builds, while rootdocker pullstill worked.
War Story¶
Alert fired: "disk at 90%". We said we'd clean it up "in the morning." Overnight the WAL archiver fell behind and within 4 hours the filesystem hit 95%. Postgres stopped accepting writes. We spent 30 minutes confused because
rootcould still write fine — even creating test files in/var/lib/postgresql/. The service account didn't have root's reserved blocks. Lesson: 90% is "clean it now," not "clean it tomorrow."
Cross-References¶
- Topic Packs: disk-and-storage-ops, linux-ops
- Case Studies: datacenter_ops/disk-full-root-services-down/, linux_ops/runaway-logs-fill-disk/
- Footguns: disk-and-storage-ops/footguns.md
- Related Patterns: FP-001 (inode exhaustion — same symptom, different cause), FP-029 (deleted-open-file — another disk-space paradox)