Skip to content

Pattern: Deleted-But-Open File

ID: FP-029 Family: Silent Corruption Frequency: Common Blast Radius: Single Host Detection Difficulty: Actively Misleading

The Shape

A file is deleted (rm) while a process still holds an open file descriptor to it. On Linux, the file's data blocks are not freed until all file descriptors are closed. df shows the old file size is gone; the inode is removed; but the disk blocks remain allocated, invisible to the filesystem. The space appears to be freed (the filename is gone) but physically isn't. Operators who delete a large log file to free space are confused when df shows no change.

How You'll See It

In Linux/Infrastructure

$ df -h /var/log
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   49G  500M  99% /var/log

$ rm /var/log/app.log    # Delete the 40GB log file

$ df -h /var/log
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G   49G  500M  99% /var/log    # UNCHANGED — space not freed

$ lsof | grep deleted
app       12345  user   3w  REG  8,1 42949672960 /var/log/app.log (deleted)
The process app still holds fd 3 open to the deleted file. The 40GB is still consumed.

In Kubernetes

Init container or sidecar writes to a log file. Main container deletes the log file but keeps it open. Pod ephemeral storage usage doesn't decrease despite the delete. Kubernetes sees the pod using 40GB of ephemeral storage; pod is evicted for exceeding the storage limit.

In CI/CD

A CI build generates a large artifact, then an earlier build step holds the artifact file open. A cleanup step deletes the artifact file. Storage quota check still shows the file size allocated. CI agent is evicted from the storage pool.

The Tell

rm a file; df shows no change in available space. lsof | grep deleted shows the file with (deleted) annotation and its size. The process that had the file open is still running.

Common Misdiagnosis

Looks Like But Actually How to Tell the Difference
Disk not freeing space after delete Deleted-but-open file lsof | grep deleted shows the file still allocated
Filesystem corruption Open file descriptor Space freed after restarting the process that has the fd open
OS caching issue File descriptor holding blocks sync && echo 3 > /proc/sys/vm/drop_caches doesn't help; only process restart does

The Fix (Generic)

  1. Immediate: Identify the process holding the fd: lsof | grep deleted. Then either restart the process (space freed when fd closes) or truncate the file: > /proc/<pid>/fd/<fd> (truncates without needing to restart).
  2. Short-term: Use truncate -s 0 /var/log/app.log instead of rm for log files of running processes; this frees the blocks while keeping the fd valid.
  3. Long-term: Use logrotate with copytruncate option for log rotation, or implement log rotation that sends SIGHUP to the process (causing it to reopen the log file at the new path).

Real-World Examples

  • Example 1: DBA ran rm /var/lib/mysql/slow-query.log (40GB). Disk appeared still full. MySQL was still writing to the deleted file. Took kill -HUP <mysql_pid> to cause MySQL to reopen its log files, at which point the 40GB was freed.
  • Example 2: Log aggregation agent kept log file open after it was rotated. Log rotation script deleted the old file. 5 days × 10GB/day = 50GB of "deleted" logs still allocated. Discovered during a disk-full incident.

War Story

Disk at 99%. I deleted the 30GB nginx access log. df: 99% still. I deleted more logs: 10GB gone. Still 99%. I was staring at the filesystem thinking it was broken. A senior engineer walked by, ran lsof | grep deleted | awk '{print $7}' | sort -rn | head, and showed me: 38GB in deleted-but-open files. Three different processes had log files open. We truncated them in place (> /proc/<pid>/fd/<fd>) one by one. 42GB freed without restarting anything. I've never deleted a log file without checking lsof first since.

Cross-References