Pattern: Deleted-But-Open File¶
ID: FP-029 Family: Silent Corruption Frequency: Common Blast Radius: Single Host Detection Difficulty: Actively Misleading
The Shape¶
A file is deleted (rm) while a process still holds an open file descriptor to it.
On Linux, the file's data blocks are not freed until all file descriptors are closed.
df shows the old file size is gone; the inode is removed; but the disk blocks remain
allocated, invisible to the filesystem. The space appears to be freed (the filename
is gone) but physically isn't. Operators who delete a large log file to free space are
confused when df shows no change.
How You'll See It¶
In Linux/Infrastructure¶
$ df -h /var/log
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 49G 500M 99% /var/log
$ rm /var/log/app.log # Delete the 40GB log file
$ df -h /var/log
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 49G 500M 99% /var/log # UNCHANGED — space not freed
$ lsof | grep deleted
app 12345 user 3w REG 8,1 42949672960 /var/log/app.log (deleted)
app still holds fd 3 open to the deleted file. The 40GB is still consumed.
In Kubernetes¶
Init container or sidecar writes to a log file. Main container deletes the log file but keeps it open. Pod ephemeral storage usage doesn't decrease despite the delete. Kubernetes sees the pod using 40GB of ephemeral storage; pod is evicted for exceeding the storage limit.
In CI/CD¶
A CI build generates a large artifact, then an earlier build step holds the artifact file open. A cleanup step deletes the artifact file. Storage quota check still shows the file size allocated. CI agent is evicted from the storage pool.
The Tell¶
rma file;dfshows no change in available space.lsof | grep deletedshows the file with(deleted)annotation and its size. The process that had the file open is still running.
Common Misdiagnosis¶
| Looks Like | But Actually | How to Tell the Difference |
|---|---|---|
| Disk not freeing space after delete | Deleted-but-open file | lsof | grep deleted shows the file still allocated |
| Filesystem corruption | Open file descriptor | Space freed after restarting the process that has the fd open |
| OS caching issue | File descriptor holding blocks | sync && echo 3 > /proc/sys/vm/drop_caches doesn't help; only process restart does |
The Fix (Generic)¶
- Immediate: Identify the process holding the fd:
lsof | grep deleted. Then either restart the process (space freed when fd closes) or truncate the file:> /proc/<pid>/fd/<fd>(truncates without needing to restart). - Short-term: Use
truncate -s 0 /var/log/app.loginstead ofrmfor log files of running processes; this frees the blocks while keeping the fd valid. - Long-term: Use logrotate with
copytruncateoption for log rotation, or implement log rotation that sends SIGHUP to the process (causing it to reopen the log file at the new path).
Real-World Examples¶
- Example 1: DBA ran
rm /var/lib/mysql/slow-query.log(40GB). Disk appeared still full. MySQL was still writing to the deleted file. Tookkill -HUP <mysql_pid>to cause MySQL to reopen its log files, at which point the 40GB was freed. - Example 2: Log aggregation agent kept log file open after it was rotated. Log rotation script deleted the old file. 5 days × 10GB/day = 50GB of "deleted" logs still allocated. Discovered during a disk-full incident.
War Story¶
Disk at 99%. I deleted the 30GB nginx access log.
df: 99% still. I deleted more logs: 10GB gone. Still 99%. I was staring at the filesystem thinking it was broken. A senior engineer walked by, ranlsof | grep deleted | awk '{print $7}' | sort -rn | head, and showed me: 38GB in deleted-but-open files. Three different processes had log files open. We truncated them in place (> /proc/<pid>/fd/<fd>) one by one. 42GB freed without restarting anything. I've never deleted a log file without checking lsof first since.
Cross-References¶
- Topic Packs: linux-ops, disk-and-storage-ops
- Footguns: disk-and-storage-ops/footguns.md — "rm vs truncate for log cleanup"
- Related Patterns: FP-001 (inode exhaustion — another disk paradox), FP-003 (disk full — same presenting symptom)