Solution¶
Triage¶
- Check inode usage:
- Identify which directory has the most files:
- Check the specific directory:
- Identify what creates the files:
Root Cause¶
A cron job creates a unique session tracking file in /var/spool/mail-sessions/ for every inbound email. Each file is tiny (< 100 bytes) but a unique file is created per email. Over 18 months of operation processing thousands of emails per day, the directory accumulated approximately 15.7 million files.
The ext4 filesystem was created with the default inode ratio, providing approximately 15.8 million inodes. With 15.7 million consumed by session files (plus inodes used by the rest of the filesystem), the inode table is full. No new files can be created anywhere on the filesystem, even though 55% of the disk space is unused.
Fix¶
Immediate (free inodes):
-
Delete old session files in batches (do not use
For faster deletion of millions of files:rm *-- it will fail with argument list too long): -
Verify inodes are freed:
Permanent fix:
-
Add a cleanup cron job:
-
Better yet, modify the application to use a database or append to a single log file instead of creating one file per email.
-
Monitor inode usage:
Rollback / Safety¶
- Before mass deletion, verify the files are safe to remove. Check with the mail team if any files are needed.
- The
find -deleteapproach is safe and handles files incrementally without building a massive argument list. - The
rsync --deletetrick is the fastest method for deleting millions of files from a single directory.
Common Traps¶
- Using
rm -rf /var/spool/mail-sessions/*. Shell glob expansion will fail with "Argument list too long" for millions of files. - Only monitoring disk space. Standard
df -hdoes not show inode usage. Always includedf -iin monitoring. - Assuming reformatting fixes it. You can specify
-i(bytes-per-inode) withmkfs.ext4to allocate more inodes, but this requires reformatting the filesystem. - Not checking for hardlinks. If the files have multiple hardlinks, deleting them does not free inodes until all links are removed.
- Ignoring the root cause. Deleting files is a band-aid. Fix the application to stop creating one file per event.