Portal | Level: L1: Foundations | Topics: grep & Regular Expressions, Bash / Shell Scripting | Domain: CLI Tools
grep & Regular Expressions - Primer¶
Why This Matters¶
Every log file, config file, codebase, and data stream you touch in operations is text. grep is the single most-used tool for finding patterns in that text. You will use it hundreds of times a day — in terminals, in scripts, in pipelines, in CI/CD. An engineer who knows grep and regex well can diagnose problems in minutes that take others hours. An engineer who does not understand regex will write fragile scripts that break on edge cases and miss critical log entries during outages.
grep Basics¶
Name origin:
grepstands forg/re/p— a command from theedline editor meaning "globally search for a regular expression and print matching lines." Ken Thompson wrote it at Bell Labs in 1973. The name stuck as Unix shorthand and became a verb: "grep it."
grep searches for patterns in files or standard input and prints matching lines.
# Basic usage: search for a literal string
grep "Connection refused" /var/log/syslog
# Search in multiple files
grep "ERROR" /var/log/*.log
# Search recursively through a directory
grep -r "TODO" ./src/
Essential Flags¶
| Flag | Purpose | Example |
|---|---|---|
-i |
Case-insensitive match | grep -i "error" app.log |
-r |
Recursive search through directories | grep -r "import os" ./src/ |
-n |
Show line numbers | grep -n "segfault" /var/log/kern.log |
-l |
Show only filenames (not matching lines) | grep -rl "deprecated" ./lib/ |
-c |
Count matching lines per file | grep -c "404" access.log |
-v |
Invert match (show non-matching lines) | grep -v "^#" config.ini |
-w |
Match whole words only | grep -w "port" config.yaml |
-o |
Print only the matched part, not the whole line | grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' access.log |
-A N |
Show N lines after each match | grep -A 5 "Exception" app.log |
-B N |
Show N lines before each match | grep -B 3 "FATAL" app.log |
-C N |
Show N lines of context (before and after) | grep -C 2 "timeout" app.log |
Context flags (-A, -B, -C) are critical during incident response. A log line saying FATAL means nothing without the stack trace that follows it or the request ID that precedes it.
# Show the 10 lines after every OOM kill message
grep -A 10 "Out of memory" /var/log/kern.log
# Show 3 lines before and after every connection timeout
grep -C 3 "Connection timed out" /var/log/app/server.log
Exit Codes¶
grep communicates through exit codes, which matters enormously in scripts:
| Exit Code | Meaning |
|---|---|
0 |
At least one match found |
1 |
No matches found |
2 |
Error (bad syntax, file not found, permission denied) |
# This is how you use grep in conditionals
if grep -q "running" /var/run/app.pid; then
echo "App is running"
else
echo "App is not running"
fi
The -q (quiet) flag suppresses output and just sets the exit code. Essential for scripting.
BRE vs ERE vs PCRE¶
grep supports three regex dialects. Not understanding which one you are using is the #1 source of regex bugs.
BRE — Basic Regular Expressions (default)¶
In BRE, metacharacters +, ?, {, }, (, ), and | are treated as literal characters. To use them as metacharacters, you must escape them with a backslash.
# BRE: must escape + and | to use as metacharacters
grep 'error\|warning' app.log # alternation
grep 'go\+d' words.txt # one or more 'o'
grep '\(foo\)\(bar\) \1' data.txt # backreference
ERE — Extended Regular Expressions (-E or egrep)¶
ERE flips the convention: metacharacters work without escaping. This is what most people expect.
# ERE: metacharacters work naturally
grep -E 'error|warning' app.log # alternation
grep -E 'go+d' words.txt # one or more 'o'
grep -E '(foo)(bar) \1' data.txt # backreference
grep -E 'https?://' urls.txt # optional 's'
Use -E by default unless you have a specific reason not to. ERE syntax is more readable, less error-prone, and matches what you use in most programming languages.
Remember: Mnemonic for grep flags: "-E for Extended, -F for Fixed, -P for Perl." When your regex is not matching, the first thing to check is whether you are using the right dialect — BRE silently treats
+,?, and|as literal characters, which is the #1 source of "why doesn't my regex work?"
PCRE — Perl-Compatible Regular Expressions (-P)¶
PCRE adds lookahead, lookbehind, non-greedy quantifiers, and other advanced features. Available via grep -P on GNU grep (not on macOS default grep).
# Lookahead: match "error" only if followed by a number
grep -P 'error(?=\s+\d+)' app.log
# Lookbehind: match a port number after "port="
grep -P '(?<=port=)\d+' config.ini
# Non-greedy quantifier
grep -P '".*?"' data.json # match shortest quoted string
# Named capture groups (useful with -o)
grep -oP '(?<=user=)\w+' auth.log
Quick Reference: Which Dialect?¶
| Feature | BRE | ERE (-E) |
PCRE (-P) |
|---|---|---|---|
. (any char) |
Yes | Yes | Yes |
* (zero or more) |
Yes | Yes | Yes |
+ (one or more) |
\+ |
+ |
+ |
? (zero or one) |
\? |
? |
? |
\| (alternation) |
\| |
\| |
\| |
() (groups) |
\(\) |
() |
() |
{n,m} (quantifier) |
\{n,m\} |
{n,m} |
{n,m} |
| Backreferences | \1 |
\1 |
\1 |
| Lookahead/lookbehind | No | No | Yes |
Non-greedy *?, +? |
No | No | Yes |
\d, \w, \s |
No | No | Yes |
Regex Fundamentals¶
Character Classes¶
[abc] # Match a, b, or c
[a-z] # Match any lowercase letter
[A-Za-z0-9] # Match any alphanumeric character
[^0-9] # Match anything that is NOT a digit
POSIX character classes work in all dialects:
[[:alpha:]] # Alphabetic characters
[[:digit:]] # Digits (0-9)
[[:alnum:]] # Alphanumeric
[[:space:]] # Whitespace (space, tab, newline)
[[:upper:]] # Uppercase letters
[[:lower:]] # Lowercase letters
[[:punct:]] # Punctuation characters
Anchors¶
# Lines that start with a comment
grep '^#' config.ini
# Lines that end with a semicolon
grep ';$' source.c
# Empty lines
grep '^$' file.txt
# Lines containing only whitespace
grep -E '^\s*$' file.txt
Quantifiers¶
* # Zero or more of preceding element
+ # One or more (ERE/PCRE)
? # Zero or one (ERE/PCRE)
{n} # Exactly n
{n,} # n or more
{n,m} # Between n and m
# Match IP-like patterns (rough)
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log
# Match lines with 3+ consecutive digits
grep -E '[0-9]{3,}' data.txt
Groups and Alternation¶
# Alternation: match either pattern
grep -E 'error|warning|critical' app.log
# Grouping: apply quantifier to group
grep -E '(ab)+' data.txt # match "ab", "abab", "ababab"
# Backreference: match repeated word
grep -E '(\b\w+\b).*\1' text.txt # find repeated words
The Dot — Universal Wildcard¶
. matches any single character except newline. This is the most misunderstood regex metacharacter.
# Matches "cat", "car", "can", "cap", ...
grep 'ca.' words.txt
# To match a literal dot, escape it
grep '192\.168\.1\.1' hosts.txt
Common Regex Patterns¶
These are patterns you will reach for repeatedly in production:
IP Addresses¶
# IPv4 (rough match — catches invalid octets like 999)
grep -E '\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b' access.log
# IPv4 (strict — only valid octets 0-255)
grep -P '\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\b' access.log
# IPv4 with CIDR notation
grep -E '\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/[0-9]{1,2}\b' routes.txt
Email Addresses¶
# Practical email extraction (not RFC-complete, but covers real-world cases)
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt
URLs¶
Dates and Timestamps¶
# ISO 8601 date: 2024-03-15
grep -E '\b[0-9]{4}-[0-9]{2}-[0-9]{2}\b' events.log
# Common log timestamp: 15/Mar/2024:10:23:45
grep -E '[0-9]{2}/[A-Z][a-z]{2}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}' access.log
# Syslog timestamp: Mar 15 10:23:45
grep -E '^[A-Z][a-z]{2}\s+[0-9]{1,2}\s+[0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/syslog
UUIDs¶
# UUID v4 pattern
grep -oE '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}' data.json
# Case-insensitive UUID
grep -oiE '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}' data.json
Log Levels¶
File Filtering¶
--include and --exclude¶
When searching codebases, you almost always want to skip certain file types:
# Search only Python files
grep -r --include='*.py' 'import requests' ./src/
# Search only YAML and JSON config files
grep -r --include='*.yaml' --include='*.yml' --include='*.json' 'database' ./config/
# Exclude vendor and node_modules directories
grep -r --exclude-dir='vendor' --exclude-dir='node_modules' 'TODO' .
# Exclude binary files and build artifacts
grep -r --exclude-dir='.git' --exclude-dir='build' --exclude='*.o' 'main' .
# Combine include and exclude
grep -r --include='*.go' --exclude-dir='vendor' 'func main' .
Null-Delimited Output¶
# Use -Z for null-delimited filenames (safe for filenames with spaces)
grep -rlZ "pattern" . | xargs -0 sed -i 's/old/new/g'
# Use -z to treat input as null-delimited (useful with find -print0)
find . -name '*.log' -print0 | xargs -0 grep -l "ERROR"
egrep and fgrep¶
egrep is equivalent to grep -E (extended regex). fgrep is equivalent to grep -F (fixed strings, no regex).
# fgrep treats everything as literal — fast for exact string matching
# Useful when your search string contains regex metacharacters
fgrep 'price=$10.00' receipts.txt
# Same as:
grep -F 'price=$10.00' receipts.txt
fgrep / grep -F is measurably faster than regex grep for literal string searches on large files because it uses optimized algorithms (like Boyer-Moore) instead of compiling a regex.
Searching Compressed Files¶
# zgrep: search through gzip-compressed files
zgrep "ERROR" /var/log/syslog.2.gz
# zgrep supports the same flags as grep
zgrep -c "404" /var/log/nginx/access.log.*.gz
# For bzip2 files
bzgrep "pattern" archive.bz2
# For xz files
xzgrep "pattern" archive.xz
grep in Pipelines¶
grep is a filter. It takes input on stdin and emits matching lines on stdout. This makes it composable:
# Find listening ports for a specific process
ss -tlnp | grep nginx
# Find environment variables matching a pattern
env | grep -i proxy
# Extract unique IPs from a log
grep -oE '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' access.log | sort -u
# Count HTTP status codes
awk '{print $9}' access.log | grep -E '^[0-9]{3}$' | sort | uniq -c | sort -rn
# Find processes (but not the grep process itself)
ps aux | grep '[n]ginx'
One-liner:
ps aux | grep '[n]ginx'— the bracket trick is one of the most widely shared Unix idioms. It works because[n]in regex matches the literal charactern, but the process list showsgrep [n]ginx, which does not contain the literal stringnginx.
The bracket trick [n]ginx prevents grep from matching its own process line. The regex [n]ginx matches the literal string nginx, but the process command line shows grep [n]ginx, which does not match.
Modern Alternatives¶
ripgrep (rg)¶
ripgrep is a grep replacement written in Rust. It is typically 2-10x faster than GNU grep for recursive searches.
# Basic search (recursive by default)
rg "TODO" ./src/
# Respect .gitignore automatically (default behavior)
rg "import" .
# Search specific file types
rg -t py "import requests"
rg -t go "func main"
# Show context
rg -C 3 "panic" ./src/
# Fixed string search
rg -F "price=$10.00" .
# PCRE2 support
rg -P '(?<=port=)\d+' config.ini
Why ripgrep is faster:
- Respects
.gitignoreby default, skippingnode_modules/,vendor/,.git/, build artifacts - Parallel traversal — uses multiple threads for directory walking and searching
- Memory-mapped I/O — avoids redundant copies of file data
- Optimized regex engine — uses the Rust
regexcrate, which compiles to DFA where possible - Smart defaults — skips binary files, hidden files, and common junk directories
ag (The Silver Searcher)¶
ag was the first popular "smart grep" tool but has been largely superseded by ripgrep in performance benchmarks.
When to Use What¶
| Tool | Best For |
|---|---|
grep |
Available everywhere, scripts that must be portable, simple searches |
grep -P |
Advanced regex on Linux systems (PCRE features) |
rg |
Codebase search (fast, respects gitignore, great defaults) |
grep -F |
High-speed literal string matching on huge files |
zgrep |
Searching compressed log archives |
Regex Engine Comparison¶
Understanding the engine differences prevents portability bugs:
| Feature | POSIX BRE | POSIX ERE | PCRE | PCRE2 |
|---|---|---|---|---|
| Tool | grep |
grep -E |
grep -P |
rg -P |
| Lazy quantifiers | No | No | *?, +? |
*?, +? |
| Lookahead | No | No | (?=...), (?!...) |
(?=...), (?!...) |
| Lookbehind | No | No | (?<=...), (?<!...) |
(?<=...), (?<!...) |
| Atomic groups | No | No | (?>...) |
(?>...) |
Unicode \p{L} |
No | No | Yes | Yes |
\d, \w, \s |
No | No | Yes | Yes |
| Named groups | No | No | (?P<name>...) |
(?P<name>...) |
PCRE is not available on macOS's default BSD grep. If you need PCRE on macOS, install GNU grep via brew install grep (available as ggrep) or use perl -ne.
Putting It Together¶
A typical investigation flow during an incident:
# 1. Find which log files contain the error
grep -rl "connection reset" /var/log/app/
# 2. Count occurrences per file to find the hotspot
grep -rc "connection reset" /var/log/app/ | sort -t: -k2 -rn | head
# 3. Look at the context around errors in the worst file
grep -n -C 5 "connection reset" /var/log/app/worker-03.log | head -100
# 4. Extract the timestamps of errors to see the pattern
grep -oE '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}' /var/log/app/worker-03.log | uniq -c
# 5. Pull out the upstream IPs involved
grep "connection reset" /var/log/app/worker-03.log | grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort | uniq -c | sort -rn
Each step narrows the search. Start broad (which files?), quantify (how many?), examine (what context?), extract (what data?), correlate (what pattern?). This is the investigative loop that grep enables.
Wiki Navigation¶
Prerequisites¶
- Linux Ops (Topic Pack, L0)
Related Content¶
- Advanced Bash for Ops (Topic Pack, L1) — Bash / Shell Scripting
- Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Bash / Shell Scripting
- Bash Flashcards (CLI) (flashcard_deck, L1) — Bash / Shell Scripting
- Cron & Job Scheduling (Topic Pack, L1) — Bash / Shell Scripting
- Environment Variables (Topic Pack, L1) — Bash / Shell Scripting
- Fleet Operations at Scale (Topic Pack, L2) — Bash / Shell Scripting
- LPIC / LFCS Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting
- Linux Ops (Topic Pack, L0) — Bash / Shell Scripting
- Linux Ops Drills (Drill, L0) — Bash / Shell Scripting
- Linux Text Processing (Topic Pack, L1) — Bash / Shell Scripting