grep & Regular Expressions - Footguns¶
Mistakes that cause missed matches, false positives, broken scripts, and wasted hours during incidents.
1. Greedy . Matching Too Much¶
. matches any character. .* grabs everything it can between the first and last possible match points.
echo 'name="foo" value="bar"' | grep -oE '".*"'
# Output: "foo" value="bar" <-- grabs first to last quote
# Fix: negated character class (works in all dialects)
echo 'name="foo" value="bar"' | grep -oE '"[^"]*"'
# Output: "foo"
# Output: "bar"
2. BRE vs ERE Syntax Confusion¶
In BRE (default grep), +, ?, |, (, ) are literal characters. Forgetting -E is the most common grep mistake.
# BROKEN: matches the literal string "error|warning"
grep 'error|warning' app.log
# Fix: use -E
grep -E 'error|warning' app.log
If grep returns zero results when you know matches exist, check whether you are accidentally using BRE syntax for ERE metacharacters.
3. grep -r Following Symlinks¶
grep -r may follow symbolic links, causing infinite loops or searching unintended mounts (like a 2TB NFS share behind a symlink).
# Safest: use find to control traversal
find /var/log -type f -name '*.log' -exec grep -l "pattern" {} +
4. "grep grep" in Process Lists¶
# grep finds itself in ps output
ps aux | grep nginx
# Output includes: grep --color=auto nginx
# Fix: bracket trick
ps aux | grep '[n]ginx'
# Fix: use pgrep
pgrep -la nginx
5. Binary Files Silently Suppressed¶
grep prints Binary file <name> matches and stops searching that file. Log files with embedded null bytes (from misbehaving apps) go unsearched.
6. PCRE Not Available Everywhere¶
grep -P is not available on macOS's default BSD grep, Alpine's BusyBox grep, or many container images.
# Portable alternative: use ERE with workarounds
grep -oE 'user=[a-zA-Z0-9_]+' auth.log | cut -d= -f2
# Or use perl directly
perl -ne 'print if /(?<=user=)\w+/' auth.log
Never use grep -P in portable scripts or Dockerfiles.
7. Locale Affecting Character Classes¶
[a-z] is locale-dependent. In some UTF-8 locales, it matches accented characters or even uppercase letters depending on collation order.
# Fix: use POSIX classes
grep '[[:lower:]]' file.txt
# Or force C locale
LC_ALL=C grep '[a-z]' file.txt
Set LC_ALL=C in scripts processing machine-generated data. It is both faster and more predictable.
Debug clue: If
grep '[a-z]'unexpectedly matches uppercase letters or accented characters, your locale is the culprit. Runlocaleto check. In many UTF-8 locales,[a-z]uses dictionary collation order, not ASCII byte order — meaning characters likeA,B,a,bare interleaved.LC_ALL=C grep '[a-z]'restores the byte-order behavior most people expect, and runs 5-10x faster on large files because it skips multibyte character handling.
8. Exit Code 1 Kills set -e Scripts¶
grep returns exit code 1 when no match is found. In set -e scripts, this terminates the script even when "no match" is expected.
# BROKEN: script dies if no errors (the good case)
set -euo pipefail
error_count=$(grep -c "ERROR" app.log)
# Fix: absorb the exit code
error_count=$(grep -c "ERROR" app.log || true)
# Fix: use if
if grep -q "ERROR" app.log; then
echo "Errors found"
fi
This is the #1 cause of grep-related script failures in production.
9. -w Breaks on Special Characters¶
-w defines a "word" as [a-zA-Z0-9_]+. Dots, hyphens, and other non-word characters create unexpected boundaries.
echo "192.168.1.1 192.168.1.10" | grep -w "192.168.1.1"
# May match both — dots break word boundaries
# Fix: use explicit boundary patterns
grep -E '(^|\s)192\.168\.1\.1($|\s)' file.txt
10. Not Quoting Regex Patterns¶
The shell interprets special characters before grep sees them. Unquoted patterns are mangled by glob expansion and variable expansion.
# BROKEN: shell expands * as a glob
grep error* app.log
# BROKEN: $USER expands to your username
grep "status=$USER" app.log
# Fix: always single-quote regex patterns
grep 'error*' app.log
grep 'status=$USER' app.log
Use single quotes for all grep patterns unless you specifically need variable expansion.