Bash: The Patterns That Matter
- lesson
- bash-scripting
- process-management
- signals
- file-descriptors
- posix-compatibility ---# Bash — The Patterns That Matter
Topics: bash scripting, process management, signals, file descriptors, POSIX compatibility Level: L1–L2 (Foundations → Operations) Time: 75–90 minutes Prerequisites: None (but you'll get more from it if you already write Bash daily)
The Mission¶
Your team inherited a deploy script. It's 200 lines of Bash that has been "working" in production for two years. Nobody wants to touch it. Last week it ran twice simultaneously and corrupted the deploy state. The week before, a failed health check was silently ignored and bad code shipped. And last month, someone hit Ctrl+C mid-deploy and the temp files never got cleaned up.
Your job: audit this script. Harden it. Make it the kind of Bash that survives 3 AM, survives concurrent cron jobs, survives the intern hitting Ctrl+C.
Along the way, you'll formalize patterns you probably already use intuitively — and discover a few you didn't know existed.
Part 1: The Safety Net — set -euo pipefail¶
Every production script starts here:
You've seen this a thousand times. But do you know when each flag betrays you?
-e (errexit): The flag with more exceptions than rules¶
set -e says "exit on any non-zero return code." What it doesn't say: the Bash manual
lists six contexts where -e is silently ignored. The exact rules span over 500 words in
the man page and have been called "the most misunderstood feature in shell scripting."
set -e
# This WILL exit the script:
false
# This will NOT exit — command is in an if condition:
if false; then echo "won't print"; fi
# This will NOT exit — left side of && or ||:
false && echo "nope"
false || echo "this runs, no exit"
# THIS is the trap that bites everyone:
local result=$(failing_command) # Always returns 0!
Gotcha:
local result=$(failing_command)swallows the exit code.localis itself a command, and its exit code (0, it created the variable successfully) masks the exit code offailing_command. The Bash manual documents this but nobody reads it.
Before (broken):
After (correct):
my_function() {
local data
data=$(curl -sf "https://api.example.com/deploy-status")
echo "${data}"
}
Separate declaration from assignment. Always.
-u (nounset): When it bites¶
-u exits on unset variables — exactly what you want. Until you write a script that checks
for optional environment variables:
set -u
# This exits with "DB_REPLICA_HOST: unbound variable":
echo "${DB_REPLICA_HOST}"
# Fix: use default values
echo "${DB_REPLICA_HOST:-}" # Empty string if unset
echo "${DB_REPLICA_HOST:-localhost}" # Default if unset
Also bites with $@ in functions that accept zero arguments (fixed in Bash 4.4+, but you
might be running Bash 4.2 on RHEL 7).
-o pipefail: The one you forget to test¶
Without pipefail, a pipeline's exit status is the last command's status. Meaning:
# Without pipefail: exit status is 0 (wc succeeded)
curl -sf https://broken.url | wc -l
echo $? # 0 — looks fine!
# With pipefail: exit status is the failing curl
set -o pipefail
curl -sf https://broken.url | wc -l
echo $? # 22 — now you see the failure
The IFS line nobody explains¶
Default IFS is space, tab, newline. This removes space from the list. Why? Because word
splitting on spaces is the #1 source of Bash bugs — filenames with spaces, paths with spaces,
arguments with spaces. Removing space from IFS makes unquoted expansion break on lines and
tabs only. It's a safety net, not a replacement for quoting.
Flashcard Check: Strict Mode¶
| Question | Answer |
|---|---|
local x=$(false); echo $? — what prints? |
0 — local masks the exit code |
Name two contexts where set -e is ignored |
if conditions; left side of &&/|| |
What does pipefail change about cmd1 \| cmd2? |
Exit status becomes the first failure, not the last command's status |
What does ${VAR:-default} do vs ${VAR:=default}? |
:- substitutes default; := substitutes and assigns |
Part 2: Traps — Your Script's Insurance Policy¶
A trap is a function that fires when your script receives a signal or exits. Without traps, Ctrl+C leaves temp files behind, lock files persist, and half-finished operations leave broken state.
The EXIT trap: clean up no matter what¶
WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/deploy.XXXXXXXXXX")
cleanup() {
rm -rf "${WORK_DIR}"
echo "[cleanup] Removed ${WORK_DIR}" >&2
}
trap cleanup EXIT
The EXIT trap fires on normal exit, set -e termination, exit 1, Ctrl+C (SIGINT), and
SIGTERM. It does not fire on SIGKILL (kill -9) — nothing can trap SIGKILL, that's the
kernel forcibly terminating your process.
The ERR trap: know which line failed¶
on_error() {
local line=$1
local cmd=$2
echo "FAILED at line ${line}: ${cmd}" >&2
logger -t deploy-script "FAILED at line ${line}: ${cmd}"
}
trap 'on_error ${LINENO} "${BASH_COMMAND}"' ERR
${BASH_COMMAND} holds the command that just failed. Combined with ${LINENO}, your error
messages go from "something broke" to "line 47: curl -sf https://api.internal/health-check
failed."
Under the Hood: The
ERRtrap respects the same exceptions asset -e. It won't fire for failures insideifconditions or after&&/||. This is by design — the Bash manual says the ERR trap "is not executed if the failed command is part of the command list immediately following awhileoruntilkeyword, part of the test in anifstatement, part of a command executed in a&&or||list."
Signal traps: graceful shutdown¶
shutdown_requested=false
handle_signal() {
local sig=$1
echo "Received ${sig}, finishing current host..." >&2
shutdown_requested=true
}
trap 'handle_signal SIGTERM' TERM
trap 'handle_signal SIGINT' INT
for host in "${HOSTS[@]}"; do
if [[ "${shutdown_requested}" == true ]]; then
echo "Shutdown requested. ${#HOSTS[@]} hosts remaining." >&2
break
fi
deploy_to_host "${host}"
done
This is the difference between a deploy that aborts mid-rsync (data corruption risk) and one that finishes the current host, then stops cleanly.
War Story: The Valve Steam client had a bug where an unquoted variable in a script could cause
rm -rf "/"to execute. The specific pattern wasrm -rf "$STEAMROOT/"*whereSTEAMROOTwas empty. The fix was exactly the pattern we'll cover next: validate the variable before using it, and quote everything. GitHub issue #3671 on ValveSoftware/steam-for-linux documents the full timeline.
Part 3: Arrays — The Most Underused Bash Feature¶
Most Bash scripts use arrays like they're afraid of them. They're not hard. They're essential.
Indexed arrays¶
# Declare and populate
declare -a HOSTS=("web-01" "web-02" "db-01" "cache-01")
# Append
HOSTS+=("monitor-01")
# Length
echo "${#HOSTS[@]}" # 5
# Iterate (safe — handles spaces in values)
for host in "${HOSTS[@]}"; do
echo "Deploying to ${host}"
done
# Iterate with index
for i in "${!HOSTS[@]}"; do
echo "[${i}/${#HOSTS[@]}] ${HOSTS[$i]}"
done
# Slice (elements 1 through 3)
echo "${HOSTS[@]:1:3}" # web-02 db-01 cache-01
# Delete element (leaves a gap — arrays are sparse!)
unset 'HOSTS[2]'
Gotcha:
unset 'HOSTS[2]'does not reindex. After unsetting index 2, the array has indices 0, 1, 3, 4. If you iterate withfor i in $(seq 0 ${#HOSTS[@]}), you'll skip entries and hit unset indices. Always use"${!HOSTS[@]}"for safe index iteration.
Associative arrays (Bash 4.0+)¶
The feature most Bash programmers don't know exists:
declare -A DEPLOY_STATUS
DEPLOY_STATUS[web-01]="success"
DEPLOY_STATUS[web-02]="failed"
DEPLOY_STATUS[db-01]="skipped"
# Check if key exists
if [[ -v DEPLOY_STATUS[web-03] ]]; then
echo "Found"
else
echo "No entry for web-03"
fi
# Iterate keys
for host in "${!DEPLOY_STATUS[@]}"; do
echo "${host}: ${DEPLOY_STATUS[$host]}"
done
# Use it as a seen-set (dedup without sort|uniq)
declare -A SEEN
while IFS= read -r line; do
[[ -v SEEN["$line"] ]] && continue
SEEN["$line"]=1
echo "${line}"
done < input.txt
Trivia: Bash arrays are zero-indexed. Zsh arrays are one-indexed by default. This difference has caused countless porting bugs and decades of flamewars. Associative arrays arrived in Bash 4.0 (2009) — meaning they're unavailable on macOS's default Bash 3.2 (Apple ships an old version due to GPLv3 licensing) and on RHEL/CentOS 6.
Part 4: Parameter Expansion — The Cheat Sheet That Replaces 5 External Commands¶
Every time you fork to sed, awk, basename, dirname, or cut for simple string
operations, you're paying a process creation cost. Parameter expansion does it in the current
shell, zero forks.
The cheat sheet¶
filepath="/var/log/nginx/access.log.gz"
# Strip path (basename equivalent)
echo "${filepath##*/}" # access.log.gz
# Strip filename (dirname equivalent)
echo "${filepath%/*}" # /var/log/nginx
# Remove shortest suffix match
echo "${filepath%.*}" # /var/log/nginx/access.log
# Remove longest suffix match
echo "${filepath%%.*}" # /var/log/nginx/access
# Remove shortest prefix match
echo "${filepath#*/}" # var/log/nginx/access.log.gz
# Remove longest prefix match
echo "${filepath##*/}" # access.log.gz
# Substitution (first match)
echo "${filepath/log/LOG}" # /var/LOG/nginx/access.log.gz
# Substitution (all matches)
echo "${filepath//log/LOG}" # /var/LOG/nginx/access.LOG.gz
# Length
echo "${#filepath}" # 32
# Substring (offset, length)
echo "${filepath:5:3}" # log
# Uppercase / lowercase (Bash 4.0+)
name="deploy_script"
echo "${name^^}" # DEPLOY_SCRIPT
echo "${name^}" # Deploy_script (first char only)
Default values — your missing config guard¶
# Use default if unset or empty
DB_HOST="${DB_HOST:-localhost}"
# Use default if unset or empty AND assign it
DB_PORT="${DB_PORT:=5432}"
# Error if unset or empty (with message)
DEPLOY_ENV="${DEPLOY_ENV:?DEPLOY_ENV must be set}"
# Use alternative value if SET (opposite of :-)
LOG_PREFIX="${VERBOSE:+[VERBOSE] }"
Remember: The colon (
:) matters.${VAR-default}only triggers when VAR is unset.${VAR:-default}triggers when VAR is unset or empty. In production, you almost always want the colon form — an empty string is rarely a valid configuration value.
Before (five forks):
dir=$(dirname "$filepath")
base=$(basename "$filepath")
ext=$(echo "$filepath" | sed 's/.*\.//')
name=$(echo "$filepath" | sed 's/\..*//')
upper=$(echo "$name" | tr '[:lower:]' '[:upper:]')
After (zero forks):
dir="${filepath%/*}"
base="${filepath##*/}"
ext="${filepath##*.}"
name="${filepath%%.*}"
upper="${name^^}"
Five external processes eliminated. On a loop processing 10,000 files, that's 50,000 fewer forks. The difference is measurable.
Flashcard Check: Expansion and Arrays¶
| Question | Answer |
|---|---|
${var%.*} vs ${var%%.*} on "file.tar.gz" |
file.tar vs file — % is shortest match, %% is longest |
| How do you check if a key exists in an associative array? | [[ -v ARRAY[key] ]] |
${var:?message} — what happens if var is empty? |
Script exits with message printed to stderr |
Why is "${HOSTS[@]}" safer than ${HOSTS[*]}? |
@ preserves element boundaries; * joins into one string |
Part 5: Process Substitution — No More Temp Files¶
<(command) creates a temporary file descriptor containing the command's stdout and passes
its path as a filename. It's process substitution, and it eliminates an entire category of
temp-file management.
# Compare config across two servers — no temp files
diff <(ssh web-01 'cat /etc/nginx/nginx.conf') \
<(ssh web-02 'cat /etc/nginx/nginx.conf')
# Feed a while loop without creating a subshell (the pipe problem)
while IFS= read -r line; do
count=$((count + 1))
done < <(grep -c ERROR /var/log/app/*.log)
# count is correct here — no subshell!
Under the Hood:
<(command)creates a named pipe under/dev/fd/and passes its path as a filename argument. The command runs in a subshell, its stdout connected to the pipe. The calling command sees a regular filename like/dev/fd/63. Process substitution was first implemented in Bash 1.0 (1989) and remains one of the most powerful yet underused shell features.
Output substitution: >(command)¶
The reverse — lets you send output to a command as if writing to a file:
# Tee to multiple destinations simultaneously
some_command | tee >(gzip > archive.gz) >(sha256sum > checksum.txt) > plain.txt
# Log and process simultaneously
deploy_script 2>&1 | tee >(logger -t deploy) >(grep ERROR > errors.txt)
Gotcha:
>(command)runs asynchronously. The main pipeline may finish before all branches complete. For critical writes, add an explicitwaitafterward.
Part 6: Here Docs and Here Strings¶
Here documents — three flavors with different rules¶
# 1. Variable expansion ON (unquoted delimiter):
cat <<EOF
Host: ${HOSTNAME}
Date: $(date)
EOF
# 2. Variable expansion OFF (quoted delimiter):
ssh remote-host bash <<'EOF'
echo "Running on $(hostname)" # Expands on REMOTE host
echo "User: ${USER}" # Remote user, not local
EOF
# 3. Strip leading tabs (dash before delimiter):
if true; then
cat <<-EOF
This is indented with tabs in source
but printed without them
EOF
fi
Gotcha: The
<<-EOFform strips tabs only, not spaces. If your editor converts tabs to spaces (most do by default),<<-EOFdoes nothing and your heredoc content has unexpected leading whitespace. This is the #1 heredoc complaint in shell scripting forums.
Here strings — <<<¶
# Instead of: echo "$variable" | grep pattern
grep pattern <<< "$variable"
# Why? No pipe = no subshell = no fork. Also cleaner:
read -r first rest <<< "hello world and more"
echo "${first}" # hello
echo "${rest}" # world and more
Trivia: Here strings (
<<<) were originally a zsh feature. Bash adopted them in version 2.05b (2002). They avoid the subshell penalty of piping fromechobecause no pipe is created — Bash writes the string to a temporary file and redirects stdin from it.
Part 7: Subshells vs. Command Groups — The Scope Trap¶
This is the bug that wastes hours because the code looks correct.
The pipe problem¶
# BROKEN: count stays 0
count=0
cat hosts.txt | while read -r host; do
(( count++ ))
done
echo "${count}" # Always 0!
Why? The pipe creates a subshell for the while loop. Variables set inside it die when the
subshell exits.
Subshells () vs. command groups {}¶
# Subshell: runs in a child process, changes don't propagate
x=1
(x=2; echo "inside: $x") # inside: 2
echo "outside: $x" # outside: 1
# Command group: runs in current shell, changes persist
x=1
{ x=2; echo "inside: $x"; } # inside: 2
echo "outside: $x" # outside: 2
The fix for the pipe problem:
# Option 1: redirect instead of pipe
count=0
while IFS= read -r host; do
(( count++ ))
done < hosts.txt
echo "${count}" # Correct!
# Option 2: process substitution
count=0
while IFS= read -r host; do
(( count++ ))
done < <(some_command)
echo "${count}" # Correct!
# Option 3: lastpipe (Bash 4.2+, not in subshells)
shopt -s lastpipe
count=0
cat hosts.txt | while IFS= read -r host; do
(( count++ ))
done
echo "${count}" # Correct with lastpipe!
Mental Model: Think of
|as a fork in the road. Everything to the right of the pipe runs in a parallel universe. When that universe ends, its variables go with it. Redirections (<) and process substitution (< <(...)) keep you in the same universe.
Part 8: Coprocesses and Named Pipes¶
Named pipes (FIFOs)¶
A named pipe is a file that acts as a pipe between unrelated processes:
# Create a named pipe
mkfifo /tmp/deploy-pipe
# Writer (in one shell or background process):
echo "web-01 deployed" > /tmp/deploy-pipe
# Reader (blocks until data arrives):
read -r status < /tmp/deploy-pipe
echo "${status}" # web-01 deployed
# Clean up
rm /tmp/deploy-pipe
Use case: decoupling a producer from a consumer without temp files. A monitoring script writes status updates to the FIFO, a dashboard script reads them.
Coprocesses (Bash 4.0+)¶
Bash's least-known feature. A coprocess is a background process with bidirectional pipes:
coproc DEPLOY { bash -c '
while IFS= read -r host; do
echo "Deploying to ${host}..."
sleep 1
echo "OK:${host}"
done
'; }
# Write to the coprocess
echo "web-01" >&"${DEPLOY[1]}"
echo "web-02" >&"${DEPLOY[1]}"
# Read from it
read -r result <&"${DEPLOY[0]}"
echo "${result}" # Deploying to web-01...
Trivia: Despite being over 15 years old (added in Bash 4.0, 2009),
coprocremains so obscure that many experienced Bash programmers have never used it. For most use cases, named pipes or process substitution are simpler.
Part 9: Lock Files and flock — Cron Job Safety¶
The inherited deploy script ran twice simultaneously because two cron jobs overlapped. This is how you prevent that.
The naive approach (broken)¶
# BROKEN: race condition between check and create
if [[ -f /var/run/deploy.lock ]]; then
echo "Already running" >&2
exit 1
fi
echo $$ > /var/run/deploy.lock
Between the -f check and the echo, another process can slip through. This is a textbook
TOCTOU (time-of-check to time-of-use) race.
The atomic approach: mkdir¶
LOCKDIR="/var/run/deploy.lock"
if ! mkdir "${LOCKDIR}" 2>/dev/null; then
echo "Already running (lock: ${LOCKDIR})" >&2
exit 1
fi
trap 'rm -rf "${LOCKDIR}"' EXIT
Under the Hood:
mkdiris atomic on all POSIX filesystems. The kernel creates the directory in a single syscall that either succeeds or fails — there's no race window. This is whymkdirworks as a lock primitive even thoughtouch+ check doesn't.
The right approach: flock¶
#!/usr/bin/env bash
set -euo pipefail
LOCKFILE="/var/run/deploy.lock"
exec 200>"${LOCKFILE}"
if ! flock -n 200; then
echo "Deploy already running" >&2
exit 1
fi
# Lock is held for the lifetime of fd 200
# It releases automatically when the script exits
echo "Starting deploy..."
Why flock over mkdir:
- Automatic release. If the script crashes, the OS closes the file descriptor, and the
lock releases. mkdir locks persist after crashes.
- Blocking mode. flock (without -n) waits for the lock. mkdir is try-or-fail only.
- No cleanup needed. No trap required for lock release.
The cron pattern:
# In crontab — flock wraps the entire command
*/5 * * * * /usr/bin/flock -n /var/run/deploy.lock /opt/scripts/deploy.sh
One line. No lock management code in the script at all.
Gotcha:
flockuses advisory locking via theflock(2)syscall. This means it only works if all competing processes useflockon the same file. A process that doesn't useflockcan still write to the file. Also,flockdoesn't work reliably on some NFS mounts — if your lock file is on NFS, test carefully or usemkdirinstead.
Flashcard Check: Pipes, Locks, Scope¶
| Question | Answer |
|---|---|
Why does a variable set inside cmd \| while read disappear? |
The pipe creates a subshell; variables die with it |
flock -n vs flock (no flag) |
-n fails immediately if locked; without it, blocks until lock is available |
Why is mkdir atomic for locking but touch isn't? |
mkdir is a single syscall that fails if the directory exists; touch + check is two operations with a race window |
| What is a FIFO? | A named pipe — a file that acts as a pipe between unrelated processes |
Part 10: Signal Handling — Graceful Shutdown¶
Your deploy script is iterating through 50 hosts. Someone sends SIGTERM (or a container orchestrator does, because it's shutting down the pod). What happens?
Default behavior vs. trapped behavior¶
| Signal | Number | Default action | What you want |
|---|---|---|---|
SIGINT |
2 | Terminate | Finish current host, then stop |
SIGTERM |
15 | Terminate | Finish current host, clean up, exit |
SIGQUIT |
3 | Core dump | Almost never trap this |
SIGKILL |
9 | Terminate (uncatchable) | Nothing — you can't trap it |
SIGUSR1 |
10 | Terminate | Status report, log dump |
The graceful shutdown pattern¶
CURRENT_HOST=""
SHUTDOWN=false
graceful_shutdown() {
echo "Shutdown requested. Finishing ${CURRENT_HOST:-nothing}..." >&2
SHUTDOWN=true
}
trap graceful_shutdown TERM INT
for host in "${HOSTS[@]}"; do
[[ "${SHUTDOWN}" == true ]] && break
CURRENT_HOST="${host}"
deploy_to_host "${host}"
done
CURRENT_HOST=""
if [[ "${SHUTDOWN}" == true ]]; then
echo "Stopped early. Completed hosts logged to ${WORK_DIR}/completed.txt" >&2
exit 130 # Convention: 128 + signal number (SIGINT=2)
fi
Remember: Exit code convention for signals: 128 + signal number. SIGINT (2) = exit 130. SIGTERM (15) = exit 143. This is how parent processes (including systemd and Kubernetes) know why a child exited.
Part 11: Debugging — Beyond echo¶
set -x and custom PS4¶
# Basic trace
set -x
# Every command is printed before execution, prefixed with +
# Better: custom PS4 shows file, line, and function
export PS4='+(${BASH_SOURCE[0]}:${LINENO}): ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'
set -x
Output becomes:
+(deploy.sh:47): main(): curl -sf https://api.internal/health
+(deploy.sh:48): main(): [[ 200 == 200 ]]
The DEBUG trap¶
Fires before every simple command. Combined with BASH_COMMAND (the about-to-execute
command), you get a trace without set -x noise. The bashdb project uses this mechanism
to implement a full step-through debugger entirely in Bash.
Syntax check without running¶
Catches syntax errors (unmatched quotes, bad if/fi pairing) without executing anything.
Fast, safe, belongs in your CI pipeline.
ShellCheck — the lint that catches what you miss¶
ShellCheck is not a style checker. It catches real bugs: unquoted variables (SC2086), unused
variables (SC2034), local masking exit codes (SC2155), POSIX compatibility issues, and
dozens more. If you write Bash for production and don't use ShellCheck, you're flying blind.
Part 12: Performance — Why Your Read Loop Is Slow¶
The fork problem¶
Every external command (sed, awk, grep, basename, cut) creates a new process.
fork() + exec() is cheap for one call. It's devastating in a loop.
Before (slow — 10,000 forks):
while IFS= read -r filepath; do
filename=$(basename "${filepath}")
extension=$(echo "${filepath}" | sed 's/.*\.//')
echo "${filename} has extension ${extension}"
done < file-list.txt
After (fast — zero forks):
while IFS= read -r filepath; do
filename="${filepath##*/}"
extension="${filepath##*.}"
echo "${filename} has extension ${extension}"
done < file-list.txt
mapfile/readarray — bulk read¶
# Instead of a while-read loop to build an array:
mapfile -t HOSTS < hosts.txt
# With a command:
mapfile -t ERRORS < <(grep ERROR /var/log/app.log)
# Now ERRORS is a proper array
echo "Found ${#ERRORS[@]} errors"
mapfile (alias readarray, added in Bash 4.0) reads lines into an array in a single
operation. No loop, no subshell, no line-by-line overhead.
printf vs echo¶
# echo is non-portable and has gotchas:
echo -e "tab\there" # Works in bash, not in some sh implementations
echo -n "no newline" # Works in bash, not in all echo implementations
# printf is POSIX, predictable, and faster for formatted output:
printf 'tab\there\n'
printf '%s\n' "no newline issues"
printf '%-20s %s\n' "${host}" "${status}" # Formatted table
Under the Hood:
echobehavior varies between shells, between versions of the same shell, and even between builds (depending on compile-time options like--enable-xpg-echo).printfis specified by POSIX and behaves identically everywhere. In production scripts, useprintffor anything more complex than a simple string.
Part 13: Anti-Patterns — Things That Work Until They Don't¶
Parsing ls¶
# BROKEN: fails on filenames with spaces, newlines, or special chars
for file in $(ls /var/log/*.log); do
echo "Processing ${file}"
done
# CORRECT: use globbing directly
for file in /var/log/*.log; do
[[ -e "${file}" ]] || continue # Handle no-match case
echo "Processing ${file}"
done
Remember: Set
shopt -s nullglobto make globs expand to nothing (instead of the literal pattern) when no files match. Without it,/var/log/*.xyziterates once with the literal string"/var/log/*.xyz"as the value.
Testing with == inside [ ]¶
# Non-portable: == is a bashism inside [ ]
[ "$status" == "active" ] # Works in bash, fails in dash/sh
# POSIX-correct:
[ "$status" = "active" ] # Single = inside [ ]
# Best: use [[ ]] in bash scripts
[[ "$status" == "active" ]] # Safe, supports patterns and regex too
Unquoted variables¶
# The war story: an unquoted variable that deleted the wrong directory
# This script was supposed to clean up old deploy artifacts:
DEPLOY_DIR="" # Bug: variable was empty due to failed config read
rm -rf ${DEPLOY_DIR}/releases/*
# Became: rm -rf /releases/*
# Which deleted /releases/ at the filesystem root
War Story: A variant of this bug hit the Valve Steam client in 2015. The script contained
rm -rf "$STEAMROOT/"*whereSTEAMROOTcould be empty, potentially evaluating torm -rf "/"*. The fix was simple:rm -rf "${STEAMROOT:?}/"*— the:?expansion causes the script to abort with an error if the variable is unset or empty, instead of proceeding with the deletion. GitHub issue #3671 on ValveSoftware/steam-for-linux has the full discussion.
Useless cat¶
# Useless cat — adds a process for no reason
cat file.txt | grep pattern
# Direct redirect
grep pattern file.txt
# Or, when you need multiple commands:
grep pattern < file.txt
This matters in loops and pipelines. One unnecessary cat in a loop processing 10,000 files
is 10,000 unnecessary forks.
Part 14: Portability — When Bash Isn't Bash¶
Bashisms that break on dash/sh¶
Debian and Ubuntu use dash (not bash) as /bin/sh. If your shebang says #!/bin/sh but
your script uses Bash features, it will fail silently or with cryptic errors.
| Bash feature | POSIX sh equivalent |
|---|---|
[[ ]] |
[ ] (with careful quoting) |
(( )) arithmetic |
$((expr)) or expr |
${var//pattern/replace} |
sed or expr |
| Arrays | No equivalent (use positional parameters) |
<(process substitution) |
Temp files |
<<< here strings |
echo "$var" \| cmd |
{1..10} brace expansion |
seq 1 10 |
[[ $x =~ regex ]] |
echo "$x" \| grep -qE 'regex' |
# When invoked as sh, Bash enters POSIX-compatibility mode
# and disables many of its extensions. This dual personality
# has been both Bash's greatest strength and its most
# confusing source of portability bugs.
# Always use the correct shebang:
#!/usr/bin/env bash # For bash scripts (finds bash in PATH)
#!/bin/sh # ONLY for POSIX-compatible scripts
Trivia: Bash deliberately extends POSIX
shsyntax. When invoked assh(via a symlink), Bash enters POSIX-compatibility mode that disables[[,(( )), arrays, brace expansion, and more. This means the same binary behaves differently depending on what name it was called with —argv[0]determines the feature set.
Flashcard Check: Anti-Patterns and Portability¶
| Question | Answer |
|---|---|
Why shouldn't you parse ls output? |
Filenames with spaces/newlines/special chars break the pipeline |
What's wrong with [ "$x" == "y" ]? |
== inside [ ] is a bashism; use = for POSIX compatibility |
${DEPLOY_DIR:?must be set} — what does it do? |
Exits with error message if DEPLOY_DIR is unset or empty |
Name a Bash feature that doesn't work in dash |
Any of: [[, (( )), arrays, process substitution, <<< |
Part 15: Real Patterns — Production Recipes¶
Retry with exponential backoff¶
retry() {
local max_attempts=$1; shift
local delay=$1; shift
local attempt=1
while (( attempt <= max_attempts )); do
if "$@"; then
return 0
fi
echo "Attempt ${attempt}/${max_attempts} failed. Retrying in ${delay}s..." >&2
sleep "${delay}"
delay=$(( delay * 2 ))
(( attempt++ ))
done
echo "All ${max_attempts} attempts failed" >&2
return 1
}
# Usage: retry <max_attempts> <initial_delay_seconds> <command...>
retry 5 2 curl -sf "https://api.internal/health"
Parallel execution with controlled concurrency¶
MAX_PARALLEL=10
for host in "${HOSTS[@]}"; do
deploy_to_host "${host}" &
# Throttle: if we've hit the limit, wait for one to finish
if (( $(jobs -r | wc -l) >= MAX_PARALLEL )); then
wait -n # Wait for any one job to finish (Bash 4.3+)
fi
done
wait # Wait for all remaining jobs
echo "All deployments complete"
Safe temp file management¶
WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/deploy.XXXXXXXXXX")
trap 'rm -rf "${WORK_DIR}"' EXIT
# All temp files go in WORK_DIR — one cleanup handles everything
DEPLOY_LOG="${WORK_DIR}/deploy.log"
HOST_LIST="${WORK_DIR}/hosts.txt"
RESULTS="${WORK_DIR}/results.json"
Exercises¶
Exercise 1: Fix the broken deploy script (5 minutes)¶
This script has 5 bugs. Find and fix them all:
#!/bin/sh
# deploy.sh - Deploy to all hosts
DEPLOY_DIR=$DEPLOY_BASE/releases
HOSTS=$(cat hosts.txt)
for host in $HOSTS; do
result=$(ssh $host "systemctl restart app")
if [ $result == "ok" ]; then
echo "Success: $host"
fi
done
rm -rf $DEPLOY_DIR/old/*
Bugs and fixes
1. `#!/bin/sh` but script may use bashisms — use `#!/usr/bin/env bash` 2. No `set -euo pipefail` 3. `$DEPLOY_DIR` unquoted and `$DEPLOY_BASE` might be empty — `rm -rf` risk 4. `$HOSTS` unquoted — word splitting breaks on hostnames with special chars 5. `[ $result == "ok" ]` — unquoted `$result` and `==` inside `[ ]` Fixed version:Exercise 2: Write a locking wrapper (10 minutes)¶
Write a function with_lock that takes a lock file path and a command, runs the command with
flock, and returns the command's exit code. It should fail fast if the lock is held.
Solution
with_lock() {
local lockfile=$1; shift
local fd
exec {fd}>"${lockfile}"
if ! flock -n "${fd}"; then
echo "Lock held: ${lockfile}" >&2
return 1
fi
"$@"
local rc=$?
# fd closes automatically when function returns (if using exec {fd}>)
return "${rc}"
}
# Usage:
with_lock /var/run/deploy.lock deploy_to_all_hosts
Exercise 3: Build a retry-with-logging wrapper (15 minutes)¶
Combine the retry pattern from Part 15 with the structured logging pattern. The wrapper should log each attempt, the delay before retry, and the final success/failure. Use an associative array to track attempt timestamps and results.
Hint
Use `declare -A ATTEMPT_LOG`, `date +%s` for timestamps, and the `log` function pattern from the primer. The retry function should populate the associative array as it goes.Cheat Sheet¶
Strict mode¶
| Setting | What it does | When it bites |
|---|---|---|
set -e |
Exit on non-zero return | Silent in if, &&, \|\|, local x=$(...) |
set -u |
Exit on unset variable | Breaks $@ in Bash <4.4; use ${VAR:-} for optional vars |
set -o pipefail |
Pipe fails on first error | Changes exit code of grep \| wc pipelines |
IFS=$'\n\t' |
Word-split on newline/tab only | Doesn't replace quoting |
Parameter expansion¶
| Syntax | Result (for f="/a/b.tar.gz") |
Replaces |
|---|---|---|
${f##*/} |
b.tar.gz |
basename |
${f%/*} |
/a |
dirname |
${f%.*} |
/a/b.tar |
Remove last extension |
${f%%.*} |
/a/b |
Remove all extensions |
${f##*.} |
gz |
Get extension |
${f/tar/zip} |
/a/b.zip.gz |
sed substitution |
${f^^} |
/A/B.TAR.GZ |
tr [:lower:] [:upper:] |
${#f} |
13 |
wc -c |
Traps¶
| Trap | Fires when | Common use |
|---|---|---|
EXIT |
Script exits (any reason except SIGKILL) | Temp file cleanup |
ERR |
Command fails (same exceptions as set -e) |
Error logging |
INT |
Ctrl+C (SIGINT) | Graceful shutdown |
TERM |
SIGTERM received | Graceful shutdown |
DEBUG |
Before every command | Tracing, profiling |
Locking¶
| Method | Atomic? | Auto-release on crash? | Blocking mode? |
|---|---|---|---|
test -f && touch |
No (race condition) | No | No |
mkdir |
Yes | No | No |
flock |
Yes | Yes (fd closes) | Yes |
Debugging¶
| Tool | Command | What it shows |
|---|---|---|
| Trace | set -x |
Every command before execution |
| Custom trace | PS4='+(${BASH_SOURCE}:${LINENO}): ' |
File:line in trace output |
| Syntax check | bash -n script.sh |
Parse errors without running |
| Lint | shellcheck -s bash script.sh |
Real bugs, not just style |
| Profiling | trap 'echo "$(date +%s.%N) $BASH_COMMAND"' DEBUG |
Per-command timestamps |
Takeaways¶
-
set -euo pipefailis necessary but not sufficient. Know the six contexts where-eis ignored, especiallylocal x=$(cmd). Separate declaration from assignment. -
trap cleanup EXITis non-negotiable. Every script that creates temp files or acquires locks needs it. The EXIT trap fires on normal exit, errors, SIGINT, and SIGTERM. -
Parameter expansion replaces five commands.
${f##*/},${f%.*},${f:-default}— learn these and your scripts get faster and cleaner overnight. -
Pipes create subshells; redirections don't. If a variable set inside a
while readloop disappears, you piped into the loop. Use< fileor< <(cmd)instead. -
Use
flockfor cron jobs. One line in crontab (flock -n /path/to/lock command) prevents the entire class of "script ran twice" bugs. -
ShellCheck isn't optional. It catches bugs that experts miss. Run it in CI.
Related Lessons¶
- The Hanging Deploy — processes, signals, and what happens when Ctrl+C doesn't work
- What Happens Inside a Linux Pipe — file descriptors, kernel buffers, and why pipes block
- Strace: Reading the Matrix — watching your script's syscalls in real time
- Text Processing: jq, awk, sed in the Trenches — when you outgrow parameter expansion