Skip to content

Solution

Triage

  1. Check inode usage:
    df -i
    
  2. Identify which directory has the most files:
    for d in /var/*/; do echo "$(find "$d" -maxdepth 1 -type f 2>/dev/null | wc -l) $d"; done | sort -rn | head -10
    
  3. Check the specific directory:
    ls -la /var/spool/mail-sessions/ | head -5
    ls -la /var/spool/mail-sessions/ | wc -l
    
  4. Identify what creates the files:
    crontab -l
    grep -r "mail-sessions" /etc/cron* /var/spool/cron/
    

Root Cause

A cron job creates a unique session tracking file in /var/spool/mail-sessions/ for every inbound email. Each file is tiny (< 100 bytes) but a unique file is created per email. Over 18 months of operation processing thousands of emails per day, the directory accumulated approximately 15.7 million files.

The ext4 filesystem was created with the default inode ratio, providing approximately 15.8 million inodes. With 15.7 million consumed by session files (plus inodes used by the rest of the filesystem), the inode table is full. No new files can be created anywhere on the filesystem, even though 55% of the disk space is unused.

Fix

Immediate (free inodes):

  1. Delete old session files in batches (do not use rm * -- it will fail with argument list too long):

    find /var/spool/mail-sessions/ -type f -mtime +7 -delete
    
    For faster deletion of millions of files:
    # Create an empty directory
    mkdir /tmp/empty
    rsync -a --delete /tmp/empty/ /var/spool/mail-sessions/
    rmdir /tmp/empty
    

  2. Verify inodes are freed:

    df -i /
    

Permanent fix:

  1. Add a cleanup cron job:

    # /etc/cron.daily/cleanup-mail-sessions
    #!/bin/bash
    find /var/spool/mail-sessions/ -type f -mtime +3 -delete
    

  2. Better yet, modify the application to use a database or append to a single log file instead of creating one file per email.

  3. Monitor inode usage:

    # Add to monitoring (node_exporter already exports node_filesystem_files_free)
    

Rollback / Safety

  • Before mass deletion, verify the files are safe to remove. Check with the mail team if any files are needed.
  • The find -delete approach is safe and handles files incrementally without building a massive argument list.
  • The rsync --delete trick is the fastest method for deleting millions of files from a single directory.

Common Traps

  • Using rm -rf /var/spool/mail-sessions/*. Shell glob expansion will fail with "Argument list too long" for millions of files.
  • Only monitoring disk space. Standard df -h does not show inode usage. Always include df -i in monitoring.
  • Assuming reformatting fixes it. You can specify -i (bytes-per-inode) with mkfs.ext4 to allocate more inodes, but this requires reformatting the filesystem.
  • Not checking for hardlinks. If the files have multiple hardlinks, deleting them does not free inodes until all links are removed.
  • Ignoring the root cause. Deleting files is a band-aid. Fix the application to stop creating one file per event.