find — Trivia & Interesting Facts¶
Surprising, historical, and little-known facts about the find command.
find dates back to Version 5 Unix in 1974¶
The find command first appeared in Unix V5 at Bell Labs, making it over 50 years old. It was written by Dick Haight, who also wrote cpio. The basic syntax — find path -name pattern -exec action — has remained remarkably stable across five decades of Unix and Linux evolution.
find traverses the filesystem using the stat() system call¶
For every file it encounters, find calls stat() (or lstat() for symlinks) to retrieve metadata like permissions, timestamps, and file type. On a filesystem with millions of files, this generates enormous I/O. This is why find / on a large server can take minutes — it is literally stat-ing every inode.
The -exec {} + syntax is dramatically faster than -exec {} ;¶
The semicolon form (-exec rm {} ;) forks a new process for every single matching file. The plus form (-exec rm {} +) batches arguments like xargs, passing as many filenames as will fit in a single command invocation. On a directory with 100,000 files, the plus form can be 50-100x faster.
find's -newer flag can compare against any timestamp reference¶
The -newer reference_file test matches files modified more recently than the reference file. The lesser-known -newerXY variant lets you compare any timestamp combination: -newermt "2024-01-01" finds files modified after a specific date, and -newerct compares against inode change time.
GNU find can optimize the traversal order of expressions¶
GNU find (the version on Linux) reorders test expressions for efficiency when safe. It evaluates cheap tests like -name before expensive tests like -perm or -exec. You can override this with -d for depth-first traversal or disable optimization entirely, though the defaults are almost always correct.
find -delete is safer than piping to rm, but has a gotcha¶
The -delete action removes matching files directly without spawning a subprocess. It is faster and handles filenames with spaces or special characters correctly. The gotcha: -delete implies -depth (processes directory contents before the directory itself), which changes the traversal order and affects how other tests evaluate.
find can detect filesystem boundary crossings¶
The -xdev (or -mount) option prevents find from crossing into different filesystems. This is essential when searching / but wanting to skip mounted network shares, tmpfs, or procfs. Without -xdev, a simple find / will traverse /proc, /sys, and every NFS mount.
The -prune action is the most misunderstood find feature¶
-prune stops find from descending into a matched directory but does not exclude the directory itself from output. To skip a directory entirely, you must combine it correctly: find / -path /proc -prune -o -name '*.log' -print. Getting the boolean logic wrong is the single most common find mistake in scripts.
find can search by file content indirectly using -exec grep¶
While find itself does not read file contents, the pattern find . -name '*.py' -exec grep -l 'pattern' {} + is one of the most common compound commands in Unix. This predates tools like ripgrep and silver searcher by decades and remains the most portable way to search file contents by name pattern.
Modern alternatives like fd are 5-10x faster for common cases¶
The fd tool (fd-find), written in Rust, defaults to regex patterns, respects .gitignore, uses colorized output, and runs parallel directory traversal. For the common case of "find files by name," fd is significantly faster. However, fd deliberately does not replicate find's full predicate system — complex timestamp, permission, or ownership queries still require find.