Skip to content

grep & Regular Expressions - Footguns

Mistakes that cause missed matches, false positives, broken scripts, and wasted hours during incidents.


1. Greedy . Matching Too Much

. matches any character. .* grabs everything it can between the first and last possible match points.

echo 'name="foo" value="bar"' | grep -oE '".*"'
# Output: "foo" value="bar"    <-- grabs first to last quote

# Fix: negated character class (works in all dialects)
echo 'name="foo" value="bar"' | grep -oE '"[^"]*"'
# Output: "foo"
# Output: "bar"

2. BRE vs ERE Syntax Confusion

In BRE (default grep), +, ?, |, (, ) are literal characters. Forgetting -E is the most common grep mistake.

# BROKEN: matches the literal string "error|warning"
grep 'error|warning' app.log

# Fix: use -E
grep -E 'error|warning' app.log

If grep returns zero results when you know matches exist, check whether you are accidentally using BRE syntax for ERE metacharacters.


grep -r may follow symbolic links, causing infinite loops or searching unintended mounts (like a 2TB NFS share behind a symlink).

# Safest: use find to control traversal
find /var/log -type f -name '*.log' -exec grep -l "pattern" {} +

4. "grep grep" in Process Lists

# grep finds itself in ps output
ps aux | grep nginx
# Output includes: grep --color=auto nginx

# Fix: bracket trick
ps aux | grep '[n]ginx'

# Fix: use pgrep
pgrep -la nginx

5. Binary Files Silently Suppressed

grep prints Binary file <name> matches and stops searching that file. Log files with embedded null bytes (from misbehaving apps) go unsearched.

# Fix: force text mode
grep -a "ERROR" /var/log/weird.log

6. PCRE Not Available Everywhere

grep -P is not available on macOS's default BSD grep, Alpine's BusyBox grep, or many container images.

# Portable alternative: use ERE with workarounds
grep -oE 'user=[a-zA-Z0-9_]+' auth.log | cut -d= -f2

# Or use perl directly
perl -ne 'print if /(?<=user=)\w+/' auth.log

Never use grep -P in portable scripts or Dockerfiles.


7. Locale Affecting Character Classes

[a-z] is locale-dependent. In some UTF-8 locales, it matches accented characters or even uppercase letters depending on collation order.

# Fix: use POSIX classes
grep '[[:lower:]]' file.txt

# Or force C locale
LC_ALL=C grep '[a-z]' file.txt

Set LC_ALL=C in scripts processing machine-generated data. It is both faster and more predictable.

Debug clue: If grep '[a-z]' unexpectedly matches uppercase letters or accented characters, your locale is the culprit. Run locale to check. In many UTF-8 locales, [a-z] uses dictionary collation order, not ASCII byte order — meaning characters like A, B, a, b are interleaved. LC_ALL=C grep '[a-z]' restores the byte-order behavior most people expect, and runs 5-10x faster on large files because it skips multibyte character handling.


8. Exit Code 1 Kills set -e Scripts

grep returns exit code 1 when no match is found. In set -e scripts, this terminates the script even when "no match" is expected.

# BROKEN: script dies if no errors (the good case)
set -euo pipefail
error_count=$(grep -c "ERROR" app.log)

# Fix: absorb the exit code
error_count=$(grep -c "ERROR" app.log || true)

# Fix: use if
if grep -q "ERROR" app.log; then
    echo "Errors found"
fi

This is the #1 cause of grep-related script failures in production.


9. -w Breaks on Special Characters

-w defines a "word" as [a-zA-Z0-9_]+. Dots, hyphens, and other non-word characters create unexpected boundaries.

echo "192.168.1.1 192.168.1.10" | grep -w "192.168.1.1"
# May match both — dots break word boundaries

# Fix: use explicit boundary patterns
grep -E '(^|\s)192\.168\.1\.1($|\s)' file.txt

10. Not Quoting Regex Patterns

The shell interprets special characters before grep sees them. Unquoted patterns are mangled by glob expansion and variable expansion.

# BROKEN: shell expands * as a glob
grep error* app.log

# BROKEN: $USER expands to your username
grep "status=$USER" app.log

# Fix: always single-quote regex patterns
grep 'error*' app.log
grep 'status=$USER' app.log

Use single quotes for all grep patterns unless you specifically need variable expansion.