Skip to content

Pattern: Inode Exhaustion

ID: FP-001 Family: Resource Exhaustion Frequency: Common Blast Radius: Single Service Detection Difficulty: Actively Misleading

The Shape

A filesystem tracks two independent resources: blocks (raw storage) and inodes (file metadata slots). When inodes run out, new files cannot be created even though df shows plenty of free space. Systems report "No space left on device" while disk usage looks normal — the mismatch between the two metrics is the tell.

How You'll See It

In Linux/Infrastructure

$ touch /tmp/test
touch: cannot touch '/tmp/test': No space left on device

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   12G   38G  24% /      plenty of space

$ df -i
Filesystem      Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1      3276800 3276800       0  100% /      100% inodes
Common cause: application writing one file per event (per email, per request, per job). After days of operation, millions of tiny files fill the inode table while the blocks (actual data) are barely touched.

In Kubernetes

Pod logs show "No space left on device" but kubectl describe node shows disk pressure is NOT triggered. That's because kubelet watches block usage, not inode usage. Pods crashloop; new file creation (log files, socket files, temp files) fails.

In CI/CD

Build runner accumulates test artifacts, coverage reports, or cache entries — one file per test case — over weeks. New build fails at "write artifact" step even though the build agent has 40GB free.

The Tell

df says disk is 24% full. df -i says inodes are 100% used. Any filesystem operation that creates a file fails with "No space left on device."

Common Misdiagnosis

Looks Like But Actually How to Tell the Difference
Disk full Inode exhaustion df -h shows free space; df -i shows 100% inode use
Application bug Filesystem limit Error is consistent across all file-creating operations, not just one code path
Permissions error No inodes strace touch /tmp/x returns ENOSPC, not EACCES

The Fix (Generic)

  1. Immediate: Find and delete the directory with the most files: find / -xdev -printf '%h\n' | sort | uniq -c | sort -rn | head -5. Delete or archive the culprit directory.
  2. Short-term: Restart the service causing accumulation; implement log rotation or artifact TTL.
  3. Long-term: Tune filesystem inode ratio at mkfs time (mkfs.ext4 -i <bytes-per-inode>); switch to a naming scheme that uses fewer files (e.g., append to a single log, use object storage for small objects).

Real-World Examples

  • Example 1: Mail server writes one .eml file per inbound message into a spool directory. After 15 million messages, inodes exhausted on /var/spool/mail; new mail delivery fails with "No space left."
  • Example 2: CI runner caches one file per URL in its HTTP cache. After 8 months, 3.2 million cache entries; build fails at artifact upload step despite 60GB free disk.

War Story

We got paged at 2am for "disk full on the mail relay" — but df showed 8% used. I stared at it for 20 minutes convinced the alert was wrong. Then someone on the call ran df -i on a whim and we saw 100%. The spool directory had 15 million session temp files that the daemon never cleaned up. find /var/spool -type f -name '*.tmp' | wc -l returned 15,742,003. We deleted them with find ... -delete in batches (to avoid ARG_MAX limits) and mail started flowing again in four minutes.

Cross-References