Pattern: Inode Exhaustion¶
ID: FP-001 Family: Resource Exhaustion Frequency: Common Blast Radius: Single Service Detection Difficulty: Actively Misleading
The Shape¶
A filesystem tracks two independent resources: blocks (raw storage) and inodes (file
metadata slots). When inodes run out, new files cannot be created even though df
shows plenty of free space. Systems report "No space left on device" while disk usage
looks normal — the mismatch between the two metrics is the tell.
How You'll See It¶
In Linux/Infrastructure¶
$ touch /tmp/test
touch: cannot touch '/tmp/test': No space left on device
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 12G 38G 24% / ← plenty of space
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 3276800 3276800 0 100% / ← 100% inodes
In Kubernetes¶
Pod logs show "No space left on device" but kubectl describe node shows disk
pressure is NOT triggered. That's because kubelet watches block usage, not inode
usage. Pods crashloop; new file creation (log files, socket files, temp files) fails.
In CI/CD¶
Build runner accumulates test artifacts, coverage reports, or cache entries — one file per test case — over weeks. New build fails at "write artifact" step even though the build agent has 40GB free.
The Tell¶
dfsays disk is 24% full.df -isays inodes are 100% used. Any filesystem operation that creates a file fails with "No space left on device."
Common Misdiagnosis¶
| Looks Like | But Actually | How to Tell the Difference |
|---|---|---|
| Disk full | Inode exhaustion | df -h shows free space; df -i shows 100% inode use |
| Application bug | Filesystem limit | Error is consistent across all file-creating operations, not just one code path |
| Permissions error | No inodes | strace touch /tmp/x returns ENOSPC, not EACCES |
The Fix (Generic)¶
- Immediate: Find and delete the directory with the most files:
find / -xdev -printf '%h\n' | sort | uniq -c | sort -rn | head -5. Delete or archive the culprit directory. - Short-term: Restart the service causing accumulation; implement log rotation or artifact TTL.
- Long-term: Tune filesystem inode ratio at mkfs time (
mkfs.ext4 -i <bytes-per-inode>); switch to a naming scheme that uses fewer files (e.g., append to a single log, use object storage for small objects).
Real-World Examples¶
- Example 1: Mail server writes one
.emlfile per inbound message into a spool directory. After 15 million messages, inodes exhausted on/var/spool/mail; new mail delivery fails with "No space left." - Example 2: CI runner caches one file per URL in its HTTP cache. After 8 months, 3.2 million cache entries; build fails at artifact upload step despite 60GB free disk.
War Story¶
We got paged at 2am for "disk full on the mail relay" — but
dfshowed 8% used. I stared at it for 20 minutes convinced the alert was wrong. Then someone on the call randf -ion a whim and we saw 100%. The spool directory had 15 million session temp files that the daemon never cleaned up.find /var/spool -type f -name '*.tmp' | wc -lreturned 15,742,003. We deleted them withfind ... -deletein batches (to avoid ARG_MAX limits) and mail started flowing again in four minutes.
Cross-References¶
- Topic Packs: disk-and-storage-ops, linux-ops
- Case Studies: linux_ops/inode-exhaustion/ — canonical real-world instance
- Footguns: disk-and-storage-ops/footguns.md — "Inode exhaustion from single-file-per-event pattern"
- Related Patterns: FP-003 (disk full looks similar but
df -his the tell), FP-029 (deleted-open-file — another disk-space paradox)