Portal | Level: L1: Foundations | Topics: grep & Regular Expressions, Bash / Shell Scripting | Domain: CLI Tools

grep & Regular Expressions - Primer¶

Why This Matters¶

Every log file, config file, codebase, and data stream you touch in operations is text. grep is the single most-used tool for finding patterns in that text. You will use it hundreds of times a day — in terminals, in scripts, in pipelines, in CI/CD. An engineer who knows grep and regex well can diagnose problems in minutes that take others hours. An engineer who does not understand regex will write fragile scripts that break on edge cases and miss critical log entries during outages.

grep Basics¶

Name origin: grep stands for g/re/p — a command from the ed line editor meaning "globally search for a regular expression and print matching lines." Ken Thompson wrote it at Bell Labs in 1973. The name stuck as Unix shorthand and became a verb: "grep it."

grep searches for patterns in files or standard input and prints matching lines.

# Basic usage: search for a literal string
grep "Connection refused" /var/log/syslog

# Search in multiple files
grep "ERROR" /var/log/*.log

# Search recursively through a directory
grep -r "TODO" ./src/

Essential Flags¶

Flag	Purpose	Example
`-i`	Case-insensitive match	`grep -i "error" app.log`
`-r`	Recursive search through directories	`grep -r "import os" ./src/`
`-n`	Show line numbers	`grep -n "segfault" /var/log/kern.log`
`-l`	Show only filenames (not matching lines)	`grep -rl "deprecated" ./lib/`
`-c`	Count matching lines per file	`grep -c "404" access.log`
`-v`	Invert match (show non-matching lines)	`grep -v "^#" config.ini`
`-w`	Match whole words only	`grep -w "port" config.yaml`
`-o`	Print only the matched part, not the whole line	`grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' access.log`
`-A N`	Show N lines after each match	`grep -A 5 "Exception" app.log`
`-B N`	Show N lines before each match	`grep -B 3 "FATAL" app.log`
`-C N`	Show N lines of context (before and after)	`grep -C 2 "timeout" app.log`

Context flags (-A, -B, -C) are critical during incident response. A log line saying FATAL means nothing without the stack trace that follows it or the request ID that precedes it.

# Show the 10 lines after every OOM kill message
grep -A 10 "Out of memory" /var/log/kern.log

# Show 3 lines before and after every connection timeout
grep -C 3 "Connection timed out" /var/log/app/server.log

Exit Codes¶

grep communicates through exit codes, which matters enormously in scripts:

Exit Code	Meaning
`0`	At least one match found
`1`	No matches found
`2`	Error (bad syntax, file not found, permission denied)

# This is how you use grep in conditionals
if grep -q "running" /var/run/app.pid; then
    echo "App is running"
else
    echo "App is not running"
fi

The -q (quiet) flag suppresses output and just sets the exit code. Essential for scripting.

BRE vs ERE vs PCRE¶

grep supports three regex dialects. Not understanding which one you are using is the #1 source of regex bugs.

BRE — Basic Regular Expressions (default)¶

In BRE, metacharacters +, ?, {, }, (, ), and | are treated as literal characters. To use them as metacharacters, you must escape them with a backslash.

# BRE: must escape + and | to use as metacharacters
grep 'error\|warning' app.log              # alternation
grep 'go\+d' words.txt                     # one or more 'o'
grep '\(foo\)\(bar\) \1' data.txt          # backreference

ERE — Extended Regular Expressions (`-E` or `egrep`)¶

ERE flips the convention: metacharacters work without escaping. This is what most people expect.

# ERE: metacharacters work naturally
grep -E 'error|warning' app.log            # alternation
grep -E 'go+d' words.txt                   # one or more 'o'
grep -E '(foo)(bar) \1' data.txt           # backreference
grep -E 'https?://' urls.txt               # optional 's'

Use -E by default unless you have a specific reason not to. ERE syntax is more readable, less error-prone, and matches what you use in most programming languages.

Remember: Mnemonic for grep flags: "-E for Extended, -F for Fixed, -P for Perl." When your regex is not matching, the first thing to check is whether you are using the right dialect — BRE silently treats +, ?, and | as literal characters, which is the #1 source of "why doesn't my regex work?"

PCRE — Perl-Compatible Regular Expressions (`-P`)¶

PCRE adds lookahead, lookbehind, non-greedy quantifiers, and other advanced features. Available via grep -P on GNU grep (not on macOS default grep).

# Lookahead: match "error" only if followed by a number
grep -P 'error(?=\s+\d+)' app.log

# Lookbehind: match a port number after "port="
grep -P '(?<=port=)\d+' config.ini

# Non-greedy quantifier
grep -P '".*?"' data.json                  # match shortest quoted string

# Named capture groups (useful with -o)
grep -oP '(?<=user=)\w+' auth.log

Quick Reference: Which Dialect?¶

Feature	BRE	ERE (`-E`)	PCRE (`-P`)
`.` (any char)	Yes	Yes	Yes
`*` (zero or more)	Yes	Yes	Yes
`+` (one or more)	`\+`	`+`	`+`
`?` (zero or one)	`\?`	`?`	`?`
`\\|` (alternation)	`\\|`	`\\|`	`\\|`
`()` (groups)	`\(\)`	`()`	`()`
`{n,m}` (quantifier)	`\{n,m\}`	`{n,m}`	`{n,m}`
Backreferences	`\1`	`\1`	`\1`
Lookahead/lookbehind	No	No	Yes
Non-greedy `*?`, `+?`	No	No	Yes
`\d`, `\w`, `\s`	No	No	Yes

Regex Fundamentals¶

Character Classes¶

[abc]         # Match a, b, or c
[a-z]         # Match any lowercase letter
[A-Za-z0-9]  # Match any alphanumeric character
[^0-9]        # Match anything that is NOT a digit

POSIX character classes work in all dialects:

[[:alpha:]]   # Alphabetic characters
[[:digit:]]   # Digits (0-9)
[[:alnum:]]   # Alphanumeric
[[:space:]]   # Whitespace (space, tab, newline)
[[:upper:]]   # Uppercase letters
[[:lower:]]   # Lowercase letters
[[:punct:]]   # Punctuation characters

Anchors¶

^             # Start of line
$             # End of line
\b            # Word boundary (ERE/PCRE)

# Lines that start with a comment
grep '^#' config.ini

# Lines that end with a semicolon
grep ';$' source.c

# Empty lines
grep '^$' file.txt

# Lines containing only whitespace
grep -E '^\s*$' file.txt

Quantifiers¶

*             # Zero or more of preceding element
+             # One or more (ERE/PCRE)
?             # Zero or one (ERE/PCRE)
{n}           # Exactly n
{n,}          # n or more
{n,m}         # Between n and m

# Match IP-like patterns (rough)
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log

# Match lines with 3+ consecutive digits
grep -E '[0-9]{3,}' data.txt

Groups and Alternation¶

# Alternation: match either pattern
grep -E 'error|warning|critical' app.log

# Grouping: apply quantifier to group
grep -E '(ab)+' data.txt                   # match "ab", "abab", "ababab"

# Backreference: match repeated word
grep -E '(\b\w+\b).*\1' text.txt          # find repeated words

The Dot — Universal Wildcard¶

. matches any single character except newline. This is the most misunderstood regex metacharacter.

# Matches "cat", "car", "can", "cap", ...
grep 'ca.' words.txt

# To match a literal dot, escape it
grep '192\.168\.1\.1' hosts.txt

Common Regex Patterns¶

These are patterns you will reach for repeatedly in production:

IP Addresses¶

# IPv4 (rough match — catches invalid octets like 999)
grep -E '\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b' access.log

# IPv4 (strict — only valid octets 0-255)
grep -P '\b(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\b' access.log

# IPv4 with CIDR notation
grep -E '\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/[0-9]{1,2}\b' routes.txt

Email Addresses¶

# Practical email extraction (not RFC-complete, but covers real-world cases)
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt

URLs¶

# HTTP/HTTPS URLs
grep -oE 'https?://[a-zA-Z0-9./?=_&%-]+' page.html

Dates and Timestamps¶

# ISO 8601 date: 2024-03-15
grep -E '\b[0-9]{4}-[0-9]{2}-[0-9]{2}\b' events.log

# Common log timestamp: 15/Mar/2024:10:23:45
grep -E '[0-9]{2}/[A-Z][a-z]{2}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2}' access.log

# Syslog timestamp: Mar 15 10:23:45
grep -E '^[A-Z][a-z]{2}\s+[0-9]{1,2}\s+[0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/syslog

UUIDs¶

# UUID v4 pattern
grep -oE '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}' data.json

# Case-insensitive UUID
grep -oiE '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}' data.json

Log Levels¶

# Standard severity levels
grep -E '\b(EMERG|ALERT|CRIT|ERROR|WARN|NOTICE|INFO|DEBUG)\b' app.log

File Filtering¶

--include and --exclude¶

When searching codebases, you almost always want to skip certain file types:

# Search only Python files
grep -r --include='*.py' 'import requests' ./src/

# Search only YAML and JSON config files
grep -r --include='*.yaml' --include='*.yml' --include='*.json' 'database' ./config/

# Exclude vendor and node_modules directories
grep -r --exclude-dir='vendor' --exclude-dir='node_modules' 'TODO' .

# Exclude binary files and build artifacts
grep -r --exclude-dir='.git' --exclude-dir='build' --exclude='*.o' 'main' .

# Combine include and exclude
grep -r --include='*.go' --exclude-dir='vendor' 'func main' .

Null-Delimited Output¶

# Use -Z for null-delimited filenames (safe for filenames with spaces)
grep -rlZ "pattern" . | xargs -0 sed -i 's/old/new/g'

# Use -z to treat input as null-delimited (useful with find -print0)
find . -name '*.log' -print0 | xargs -0 grep -l "ERROR"

egrep and fgrep¶

egrep is equivalent to grep -E (extended regex). fgrep is equivalent to grep -F (fixed strings, no regex).

# fgrep treats everything as literal — fast for exact string matching
# Useful when your search string contains regex metacharacters
fgrep 'price=$10.00' receipts.txt

# Same as:
grep -F 'price=$10.00' receipts.txt

fgrep / grep -F is measurably faster than regex grep for literal string searches on large files because it uses optimized algorithms (like Boyer-Moore) instead of compiling a regex.

Searching Compressed Files¶

# zgrep: search through gzip-compressed files
zgrep "ERROR" /var/log/syslog.2.gz

# zgrep supports the same flags as grep
zgrep -c "404" /var/log/nginx/access.log.*.gz

# For bzip2 files
bzgrep "pattern" archive.bz2

# For xz files
xzgrep "pattern" archive.xz

grep in Pipelines¶

grep is a filter. It takes input on stdin and emits matching lines on stdout. This makes it composable:

# Find listening ports for a specific process
ss -tlnp | grep nginx

# Find environment variables matching a pattern
env | grep -i proxy

# Extract unique IPs from a log
grep -oE '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' access.log | sort -u

# Count HTTP status codes
awk '{print $9}' access.log | grep -E '^[0-9]{3}$' | sort | uniq -c | sort -rn

# Find processes (but not the grep process itself)
ps aux | grep '[n]ginx'

One-liner: ps aux | grep '[n]ginx' — the bracket trick is one of the most widely shared Unix idioms. It works because [n] in regex matches the literal character n, but the process list shows grep [n]ginx, which does not contain the literal string nginx.

The bracket trick [n]ginx prevents grep from matching its own process line. The regex [n]ginx matches the literal string nginx, but the process command line shows grep [n]ginx, which does not match.

Modern Alternatives¶

ripgrep (rg)¶

ripgrep is a grep replacement written in Rust. It is typically 2-10x faster than GNU grep for recursive searches.

# Basic search (recursive by default)
rg "TODO" ./src/

# Respect .gitignore automatically (default behavior)
rg "import" .

# Search specific file types
rg -t py "import requests"
rg -t go "func main"

# Show context
rg -C 3 "panic" ./src/

# Fixed string search
rg -F "price=$10.00" .

# PCRE2 support
rg -P '(?<=port=)\d+' config.ini

Why ripgrep is faster:

Respects .gitignore by default, skipping node_modules/, vendor/, .git/, build artifacts
Parallel traversal — uses multiple threads for directory walking and searching
Memory-mapped I/O — avoids redundant copies of file data
Optimized regex engine — uses the Rust regex crate, which compiles to DFA where possible
Smart defaults — skips binary files, hidden files, and common junk directories

ag (The Silver Searcher)¶

# Similar to ripgrep but older
ag "pattern" ./src/
ag --python "import" .

ag was the first popular "smart grep" tool but has been largely superseded by ripgrep in performance benchmarks.

When to Use What¶

Tool	Best For
`grep`	Available everywhere, scripts that must be portable, simple searches
`grep -P`	Advanced regex on Linux systems (PCRE features)
`rg`	Codebase search (fast, respects gitignore, great defaults)
`grep -F`	High-speed literal string matching on huge files
`zgrep`	Searching compressed log archives

Regex Engine Comparison¶

Understanding the engine differences prevents portability bugs:

Feature	POSIX BRE	POSIX ERE	PCRE	PCRE2
Tool	`grep`	`grep -E`	`grep -P`	`rg -P`
Lazy quantifiers	No	No	`*?`, `+?`	`*?`, `+?`
Lookahead	No	No	`(?=...)`, `(?!...)`	`(?=...)`, `(?!...)`
Lookbehind	No	No	`(?<=...)`, `(?<!...)`	`(?<=...)`, `(?<!...)`
Atomic groups	No	No	`(?>...)`	`(?>...)`
Unicode `\p{L}`	No	No	Yes	Yes
`\d`, `\w`, `\s`	No	No	Yes	Yes
Named groups	No	No	`(?P<name>...)`	`(?P<name>...)`

PCRE is not available on macOS's default BSD grep. If you need PCRE on macOS, install GNU grep via brew install grep (available as ggrep) or use perl -ne.

Putting It Together¶

A typical investigation flow during an incident:

# 1. Find which log files contain the error
grep -rl "connection reset" /var/log/app/

# 2. Count occurrences per file to find the hotspot
grep -rc "connection reset" /var/log/app/ | sort -t: -k2 -rn | head

# 3. Look at the context around errors in the worst file
grep -n -C 5 "connection reset" /var/log/app/worker-03.log | head -100

# 4. Extract the timestamps of errors to see the pattern
grep -oE '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}' /var/log/app/worker-03.log | uniq -c

# 5. Pull out the upstream IPs involved
grep "connection reset" /var/log/app/worker-03.log | grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort | uniq -c | sort -rn

Each step narrows the search. Start broad (which files?), quantify (how many?), examine (what context?), extract (what data?), correlate (what pattern?). This is the investigative loop that grep enables.

Prerequisites¶

Linux Ops (Topic Pack, L0)

Advanced Bash for Ops (Topic Pack, L1) — Bash / Shell Scripting
Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Bash / Shell Scripting
Bash Flashcards (CLI) (flashcard_deck, L1) — Bash / Shell Scripting
Cron & Job Scheduling (Topic Pack, L1) — Bash / Shell Scripting
Environment Variables (Topic Pack, L1) — Bash / Shell Scripting
Fleet Operations at Scale (Topic Pack, L2) — Bash / Shell Scripting
LPIC / LFCS Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting
Linux Ops (Topic Pack, L0) — Bash / Shell Scripting
Linux Ops Drills (Drill, L0) — Bash / Shell Scripting
Linux Text Processing (Topic Pack, L1) — Bash / Shell Scripting

grep & Regular Expressions - Primer¶

Why This Matters¶

grep Basics¶

Essential Flags¶

Exit Codes¶

BRE vs ERE vs PCRE¶

BRE — Basic Regular Expressions (default)¶

ERE — Extended Regular Expressions (`-E` or `egrep`)¶

PCRE — Perl-Compatible Regular Expressions (`-P`)¶

Quick Reference: Which Dialect?¶

Regex Fundamentals¶

Character Classes¶

Anchors¶

Quantifiers¶

Groups and Alternation¶

The Dot — Universal Wildcard¶

Common Regex Patterns¶

IP Addresses¶

Email Addresses¶

URLs¶

Dates and Timestamps¶

UUIDs¶

Log Levels¶

File Filtering¶

--include and --exclude¶

Null-Delimited Output¶

egrep and fgrep¶

Searching Compressed Files¶

grep in Pipelines¶

Modern Alternatives¶

ripgrep (rg)¶

ag (The Silver Searcher)¶

When to Use What¶

Regex Engine Comparison¶

Putting It Together¶

Wiki Navigation¶

Prerequisites¶

Pages that link here¶

grep & Regular Expressions - Primer¶

Why This Matters¶

grep Basics¶

Essential Flags¶

Exit Codes¶

BRE vs ERE vs PCRE¶

BRE — Basic Regular Expressions (default)¶

ERE — Extended Regular Expressions (-E or egrep)¶

PCRE — Perl-Compatible Regular Expressions (-P)¶

Quick Reference: Which Dialect?¶

Regex Fundamentals¶

Character Classes¶

Anchors¶

Quantifiers¶

Groups and Alternation¶

The Dot — Universal Wildcard¶

Common Regex Patterns¶

IP Addresses¶

Email Addresses¶

URLs¶

Dates and Timestamps¶

UUIDs¶

Log Levels¶

File Filtering¶

--include and --exclude¶

Null-Delimited Output¶

egrep and fgrep¶

Searching Compressed Files¶

grep in Pipelines¶

Modern Alternatives¶

ripgrep (rg)¶

ag (The Silver Searcher)¶

When to Use What¶

Regex Engine Comparison¶

Putting It Together¶

Wiki Navigation¶

Prerequisites¶

Related Content¶

Pages that link here¶

ERE — Extended Regular Expressions (`-E` or `egrep`)¶

PCRE — Perl-Compatible Regular Expressions (`-P`)¶