Skip to content

Bash: The Patterns That Matter

  • lesson
  • bash-scripting
  • process-management
  • signals
  • file-descriptors
  • posix-compatibility ---# Bash — The Patterns That Matter

Topics: bash scripting, process management, signals, file descriptors, POSIX compatibility Level: L1–L2 (Foundations → Operations) Time: 75–90 minutes Prerequisites: None (but you'll get more from it if you already write Bash daily)


The Mission

Your team inherited a deploy script. It's 200 lines of Bash that has been "working" in production for two years. Nobody wants to touch it. Last week it ran twice simultaneously and corrupted the deploy state. The week before, a failed health check was silently ignored and bad code shipped. And last month, someone hit Ctrl+C mid-deploy and the temp files never got cleaned up.

Your job: audit this script. Harden it. Make it the kind of Bash that survives 3 AM, survives concurrent cron jobs, survives the intern hitting Ctrl+C.

Along the way, you'll formalize patterns you probably already use intuitively — and discover a few you didn't know existed.


Part 1: The Safety Net — set -euo pipefail

Every production script starts here:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

You've seen this a thousand times. But do you know when each flag betrays you?

-e (errexit): The flag with more exceptions than rules

set -e says "exit on any non-zero return code." What it doesn't say: the Bash manual lists six contexts where -e is silently ignored. The exact rules span over 500 words in the man page and have been called "the most misunderstood feature in shell scripting."

set -e

# This WILL exit the script:
false

# This will NOT exit — command is in an if condition:
if false; then echo "won't print"; fi

# This will NOT exit — left side of && or ||:
false && echo "nope"
false || echo "this runs, no exit"

# THIS is the trap that bites everyone:
local result=$(failing_command)   # Always returns 0!

Gotcha: local result=$(failing_command) swallows the exit code. local is itself a command, and its exit code (0, it created the variable successfully) masks the exit code of failing_command. The Bash manual documents this but nobody reads it.

Before (broken):

my_function() {
    local data=$(curl -sf "https://api.example.com/deploy-status")
    echo "${data}"
}

After (correct):

my_function() {
    local data
    data=$(curl -sf "https://api.example.com/deploy-status")
    echo "${data}"
}

Separate declaration from assignment. Always.

-u (nounset): When it bites

-u exits on unset variables — exactly what you want. Until you write a script that checks for optional environment variables:

set -u
# This exits with "DB_REPLICA_HOST: unbound variable":
echo "${DB_REPLICA_HOST}"

# Fix: use default values
echo "${DB_REPLICA_HOST:-}"          # Empty string if unset
echo "${DB_REPLICA_HOST:-localhost}" # Default if unset

Also bites with $@ in functions that accept zero arguments (fixed in Bash 4.4+, but you might be running Bash 4.2 on RHEL 7).

-o pipefail: The one you forget to test

Without pipefail, a pipeline's exit status is the last command's status. Meaning:

# Without pipefail: exit status is 0 (wc succeeded)
curl -sf https://broken.url | wc -l
echo $?  # 0 — looks fine!

# With pipefail: exit status is the failing curl
set -o pipefail
curl -sf https://broken.url | wc -l
echo $?  # 22 — now you see the failure

The IFS line nobody explains

IFS=$'\n\t'

Default IFS is space, tab, newline. This removes space from the list. Why? Because word splitting on spaces is the #1 source of Bash bugs — filenames with spaces, paths with spaces, arguments with spaces. Removing space from IFS makes unquoted expansion break on lines and tabs only. It's a safety net, not a replacement for quoting.


Flashcard Check: Strict Mode

Question Answer
local x=$(false); echo $? — what prints? 0local masks the exit code
Name two contexts where set -e is ignored if conditions; left side of &&/||
What does pipefail change about cmd1 \| cmd2? Exit status becomes the first failure, not the last command's status
What does ${VAR:-default} do vs ${VAR:=default}? :- substitutes default; := substitutes and assigns

Part 2: Traps — Your Script's Insurance Policy

A trap is a function that fires when your script receives a signal or exits. Without traps, Ctrl+C leaves temp files behind, lock files persist, and half-finished operations leave broken state.

The EXIT trap: clean up no matter what

WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/deploy.XXXXXXXXXX")

cleanup() {
    rm -rf "${WORK_DIR}"
    echo "[cleanup] Removed ${WORK_DIR}" >&2
}
trap cleanup EXIT

The EXIT trap fires on normal exit, set -e termination, exit 1, Ctrl+C (SIGINT), and SIGTERM. It does not fire on SIGKILL (kill -9) — nothing can trap SIGKILL, that's the kernel forcibly terminating your process.

The ERR trap: know which line failed

on_error() {
    local line=$1
    local cmd=$2
    echo "FAILED at line ${line}: ${cmd}" >&2
    logger -t deploy-script "FAILED at line ${line}: ${cmd}"
}
trap 'on_error ${LINENO} "${BASH_COMMAND}"' ERR

${BASH_COMMAND} holds the command that just failed. Combined with ${LINENO}, your error messages go from "something broke" to "line 47: curl -sf https://api.internal/health-check failed."

Under the Hood: The ERR trap respects the same exceptions as set -e. It won't fire for failures inside if conditions or after &&/||. This is by design — the Bash manual says the ERR trap "is not executed if the failed command is part of the command list immediately following a while or until keyword, part of the test in an if statement, part of a command executed in a && or || list."

Signal traps: graceful shutdown

shutdown_requested=false

handle_signal() {
    local sig=$1
    echo "Received ${sig}, finishing current host..." >&2
    shutdown_requested=true
}
trap 'handle_signal SIGTERM' TERM
trap 'handle_signal SIGINT' INT

for host in "${HOSTS[@]}"; do
    if [[ "${shutdown_requested}" == true ]]; then
        echo "Shutdown requested. ${#HOSTS[@]} hosts remaining." >&2
        break
    fi
    deploy_to_host "${host}"
done

This is the difference between a deploy that aborts mid-rsync (data corruption risk) and one that finishes the current host, then stops cleanly.

War Story: The Valve Steam client had a bug where an unquoted variable in a script could cause rm -rf "/" to execute. The specific pattern was rm -rf "$STEAMROOT/"* where STEAMROOT was empty. The fix was exactly the pattern we'll cover next: validate the variable before using it, and quote everything. GitHub issue #3671 on ValveSoftware/steam-for-linux documents the full timeline.


Part 3: Arrays — The Most Underused Bash Feature

Most Bash scripts use arrays like they're afraid of them. They're not hard. They're essential.

Indexed arrays

# Declare and populate
declare -a HOSTS=("web-01" "web-02" "db-01" "cache-01")

# Append
HOSTS+=("monitor-01")

# Length
echo "${#HOSTS[@]}"  # 5

# Iterate (safe — handles spaces in values)
for host in "${HOSTS[@]}"; do
    echo "Deploying to ${host}"
done

# Iterate with index
for i in "${!HOSTS[@]}"; do
    echo "[${i}/${#HOSTS[@]}] ${HOSTS[$i]}"
done

# Slice (elements 1 through 3)
echo "${HOSTS[@]:1:3}"  # web-02 db-01 cache-01

# Delete element (leaves a gap — arrays are sparse!)
unset 'HOSTS[2]'

Gotcha: unset 'HOSTS[2]' does not reindex. After unsetting index 2, the array has indices 0, 1, 3, 4. If you iterate with for i in $(seq 0 ${#HOSTS[@]}), you'll skip entries and hit unset indices. Always use "${!HOSTS[@]}" for safe index iteration.

Associative arrays (Bash 4.0+)

The feature most Bash programmers don't know exists:

declare -A DEPLOY_STATUS

DEPLOY_STATUS[web-01]="success"
DEPLOY_STATUS[web-02]="failed"
DEPLOY_STATUS[db-01]="skipped"

# Check if key exists
if [[ -v DEPLOY_STATUS[web-03] ]]; then
    echo "Found"
else
    echo "No entry for web-03"
fi

# Iterate keys
for host in "${!DEPLOY_STATUS[@]}"; do
    echo "${host}: ${DEPLOY_STATUS[$host]}"
done

# Use it as a seen-set (dedup without sort|uniq)
declare -A SEEN
while IFS= read -r line; do
    [[ -v SEEN["$line"] ]] && continue
    SEEN["$line"]=1
    echo "${line}"
done < input.txt

Trivia: Bash arrays are zero-indexed. Zsh arrays are one-indexed by default. This difference has caused countless porting bugs and decades of flamewars. Associative arrays arrived in Bash 4.0 (2009) — meaning they're unavailable on macOS's default Bash 3.2 (Apple ships an old version due to GPLv3 licensing) and on RHEL/CentOS 6.


Part 4: Parameter Expansion — The Cheat Sheet That Replaces 5 External Commands

Every time you fork to sed, awk, basename, dirname, or cut for simple string operations, you're paying a process creation cost. Parameter expansion does it in the current shell, zero forks.

The cheat sheet

filepath="/var/log/nginx/access.log.gz"

# Strip path (basename equivalent)
echo "${filepath##*/}"          # access.log.gz

# Strip filename (dirname equivalent)
echo "${filepath%/*}"           # /var/log/nginx

# Remove shortest suffix match
echo "${filepath%.*}"           # /var/log/nginx/access.log

# Remove longest suffix match
echo "${filepath%%.*}"          # /var/log/nginx/access

# Remove shortest prefix match
echo "${filepath#*/}"           # var/log/nginx/access.log.gz

# Remove longest prefix match
echo "${filepath##*/}"          # access.log.gz

# Substitution (first match)
echo "${filepath/log/LOG}"      # /var/LOG/nginx/access.log.gz

# Substitution (all matches)
echo "${filepath//log/LOG}"     # /var/LOG/nginx/access.LOG.gz

# Length
echo "${#filepath}"             # 32

# Substring (offset, length)
echo "${filepath:5:3}"          # log

# Uppercase / lowercase (Bash 4.0+)
name="deploy_script"
echo "${name^^}"                # DEPLOY_SCRIPT
echo "${name^}"                 # Deploy_script (first char only)

Default values — your missing config guard

# Use default if unset or empty
DB_HOST="${DB_HOST:-localhost}"

# Use default if unset or empty AND assign it
DB_PORT="${DB_PORT:=5432}"

# Error if unset or empty (with message)
DEPLOY_ENV="${DEPLOY_ENV:?DEPLOY_ENV must be set}"

# Use alternative value if SET (opposite of :-)
LOG_PREFIX="${VERBOSE:+[VERBOSE] }"

Remember: The colon (:) matters. ${VAR-default} only triggers when VAR is unset. ${VAR:-default} triggers when VAR is unset or empty. In production, you almost always want the colon form — an empty string is rarely a valid configuration value.

Before (five forks):

dir=$(dirname "$filepath")
base=$(basename "$filepath")
ext=$(echo "$filepath" | sed 's/.*\.//')
name=$(echo "$filepath" | sed 's/\..*//')
upper=$(echo "$name" | tr '[:lower:]' '[:upper:]')

After (zero forks):

dir="${filepath%/*}"
base="${filepath##*/}"
ext="${filepath##*.}"
name="${filepath%%.*}"
upper="${name^^}"

Five external processes eliminated. On a loop processing 10,000 files, that's 50,000 fewer forks. The difference is measurable.


Flashcard Check: Expansion and Arrays

Question Answer
${var%.*} vs ${var%%.*} on "file.tar.gz" file.tar vs file% is shortest match, %% is longest
How do you check if a key exists in an associative array? [[ -v ARRAY[key] ]]
${var:?message} — what happens if var is empty? Script exits with message printed to stderr
Why is "${HOSTS[@]}" safer than ${HOSTS[*]}? @ preserves element boundaries; * joins into one string

Part 5: Process Substitution — No More Temp Files

<(command) creates a temporary file descriptor containing the command's stdout and passes its path as a filename. It's process substitution, and it eliminates an entire category of temp-file management.

# Compare config across two servers — no temp files
diff <(ssh web-01 'cat /etc/nginx/nginx.conf') \
     <(ssh web-02 'cat /etc/nginx/nginx.conf')

# Feed a while loop without creating a subshell (the pipe problem)
while IFS= read -r line; do
    count=$((count + 1))
done < <(grep -c ERROR /var/log/app/*.log)
# count is correct here — no subshell!

Under the Hood: <(command) creates a named pipe under /dev/fd/ and passes its path as a filename argument. The command runs in a subshell, its stdout connected to the pipe. The calling command sees a regular filename like /dev/fd/63. Process substitution was first implemented in Bash 1.0 (1989) and remains one of the most powerful yet underused shell features.

Output substitution: >(command)

The reverse — lets you send output to a command as if writing to a file:

# Tee to multiple destinations simultaneously
some_command | tee >(gzip > archive.gz) >(sha256sum > checksum.txt) > plain.txt

# Log and process simultaneously
deploy_script 2>&1 | tee >(logger -t deploy) >(grep ERROR > errors.txt)

Gotcha: >(command) runs asynchronously. The main pipeline may finish before all branches complete. For critical writes, add an explicit wait afterward.


Part 6: Here Docs and Here Strings

Here documents — three flavors with different rules

# 1. Variable expansion ON (unquoted delimiter):
cat <<EOF
Host: ${HOSTNAME}
Date: $(date)
EOF

# 2. Variable expansion OFF (quoted delimiter):
ssh remote-host bash <<'EOF'
echo "Running on $(hostname)"   # Expands on REMOTE host
echo "User: ${USER}"            # Remote user, not local
EOF

# 3. Strip leading tabs (dash before delimiter):
if true; then
    cat <<-EOF
    This is indented with tabs in source
    but printed without them
    EOF
fi

Gotcha: The <<-EOF form strips tabs only, not spaces. If your editor converts tabs to spaces (most do by default), <<-EOF does nothing and your heredoc content has unexpected leading whitespace. This is the #1 heredoc complaint in shell scripting forums.

Here strings — <<<

# Instead of: echo "$variable" | grep pattern
grep pattern <<< "$variable"

# Why? No pipe = no subshell = no fork. Also cleaner:
read -r first rest <<< "hello world and more"
echo "${first}"   # hello
echo "${rest}"    # world and more

Trivia: Here strings (<<<) were originally a zsh feature. Bash adopted them in version 2.05b (2002). They avoid the subshell penalty of piping from echo because no pipe is created — Bash writes the string to a temporary file and redirects stdin from it.


Part 7: Subshells vs. Command Groups — The Scope Trap

This is the bug that wastes hours because the code looks correct.

The pipe problem

# BROKEN: count stays 0
count=0
cat hosts.txt | while read -r host; do
    (( count++ ))
done
echo "${count}"  # Always 0!

Why? The pipe creates a subshell for the while loop. Variables set inside it die when the subshell exits.

Subshells () vs. command groups {}

# Subshell: runs in a child process, changes don't propagate
x=1
(x=2; echo "inside: $x")    # inside: 2
echo "outside: $x"           # outside: 1

# Command group: runs in current shell, changes persist
x=1
{ x=2; echo "inside: $x"; } # inside: 2
echo "outside: $x"           # outside: 2

The fix for the pipe problem:

# Option 1: redirect instead of pipe
count=0
while IFS= read -r host; do
    (( count++ ))
done < hosts.txt
echo "${count}"  # Correct!

# Option 2: process substitution
count=0
while IFS= read -r host; do
    (( count++ ))
done < <(some_command)
echo "${count}"  # Correct!

# Option 3: lastpipe (Bash 4.2+, not in subshells)
shopt -s lastpipe
count=0
cat hosts.txt | while IFS= read -r host; do
    (( count++ ))
done
echo "${count}"  # Correct with lastpipe!

Mental Model: Think of | as a fork in the road. Everything to the right of the pipe runs in a parallel universe. When that universe ends, its variables go with it. Redirections (<) and process substitution (< <(...)) keep you in the same universe.


Part 8: Coprocesses and Named Pipes

Named pipes (FIFOs)

A named pipe is a file that acts as a pipe between unrelated processes:

# Create a named pipe
mkfifo /tmp/deploy-pipe

# Writer (in one shell or background process):
echo "web-01 deployed" > /tmp/deploy-pipe

# Reader (blocks until data arrives):
read -r status < /tmp/deploy-pipe
echo "${status}"  # web-01 deployed

# Clean up
rm /tmp/deploy-pipe

Use case: decoupling a producer from a consumer without temp files. A monitoring script writes status updates to the FIFO, a dashboard script reads them.

Coprocesses (Bash 4.0+)

Bash's least-known feature. A coprocess is a background process with bidirectional pipes:

coproc DEPLOY { bash -c '
    while IFS= read -r host; do
        echo "Deploying to ${host}..."
        sleep 1
        echo "OK:${host}"
    done
'; }

# Write to the coprocess
echo "web-01" >&"${DEPLOY[1]}"
echo "web-02" >&"${DEPLOY[1]}"

# Read from it
read -r result <&"${DEPLOY[0]}"
echo "${result}"  # Deploying to web-01...

Trivia: Despite being over 15 years old (added in Bash 4.0, 2009), coproc remains so obscure that many experienced Bash programmers have never used it. For most use cases, named pipes or process substitution are simpler.


Part 9: Lock Files and flock — Cron Job Safety

The inherited deploy script ran twice simultaneously because two cron jobs overlapped. This is how you prevent that.

The naive approach (broken)

# BROKEN: race condition between check and create
if [[ -f /var/run/deploy.lock ]]; then
    echo "Already running" >&2
    exit 1
fi
echo $$ > /var/run/deploy.lock

Between the -f check and the echo, another process can slip through. This is a textbook TOCTOU (time-of-check to time-of-use) race.

The atomic approach: mkdir

LOCKDIR="/var/run/deploy.lock"

if ! mkdir "${LOCKDIR}" 2>/dev/null; then
    echo "Already running (lock: ${LOCKDIR})" >&2
    exit 1
fi
trap 'rm -rf "${LOCKDIR}"' EXIT

Under the Hood: mkdir is atomic on all POSIX filesystems. The kernel creates the directory in a single syscall that either succeeds or fails — there's no race window. This is why mkdir works as a lock primitive even though touch + check doesn't.

The right approach: flock

#!/usr/bin/env bash
set -euo pipefail

LOCKFILE="/var/run/deploy.lock"
exec 200>"${LOCKFILE}"

if ! flock -n 200; then
    echo "Deploy already running" >&2
    exit 1
fi

# Lock is held for the lifetime of fd 200
# It releases automatically when the script exits
echo "Starting deploy..."

Why flock over mkdir: - Automatic release. If the script crashes, the OS closes the file descriptor, and the lock releases. mkdir locks persist after crashes. - Blocking mode. flock (without -n) waits for the lock. mkdir is try-or-fail only. - No cleanup needed. No trap required for lock release.

The cron pattern:

# In crontab — flock wraps the entire command
*/5 * * * * /usr/bin/flock -n /var/run/deploy.lock /opt/scripts/deploy.sh

One line. No lock management code in the script at all.

Gotcha: flock uses advisory locking via the flock(2) syscall. This means it only works if all competing processes use flock on the same file. A process that doesn't use flock can still write to the file. Also, flock doesn't work reliably on some NFS mounts — if your lock file is on NFS, test carefully or use mkdir instead.


Flashcard Check: Pipes, Locks, Scope

Question Answer
Why does a variable set inside cmd \| while read disappear? The pipe creates a subshell; variables die with it
flock -n vs flock (no flag) -n fails immediately if locked; without it, blocks until lock is available
Why is mkdir atomic for locking but touch isn't? mkdir is a single syscall that fails if the directory exists; touch + check is two operations with a race window
What is a FIFO? A named pipe — a file that acts as a pipe between unrelated processes

Part 10: Signal Handling — Graceful Shutdown

Your deploy script is iterating through 50 hosts. Someone sends SIGTERM (or a container orchestrator does, because it's shutting down the pod). What happens?

Default behavior vs. trapped behavior

Signal Number Default action What you want
SIGINT 2 Terminate Finish current host, then stop
SIGTERM 15 Terminate Finish current host, clean up, exit
SIGQUIT 3 Core dump Almost never trap this
SIGKILL 9 Terminate (uncatchable) Nothing — you can't trap it
SIGUSR1 10 Terminate Status report, log dump

The graceful shutdown pattern

CURRENT_HOST=""
SHUTDOWN=false

graceful_shutdown() {
    echo "Shutdown requested. Finishing ${CURRENT_HOST:-nothing}..." >&2
    SHUTDOWN=true
}
trap graceful_shutdown TERM INT

for host in "${HOSTS[@]}"; do
    [[ "${SHUTDOWN}" == true ]] && break
    CURRENT_HOST="${host}"
    deploy_to_host "${host}"
done

CURRENT_HOST=""
if [[ "${SHUTDOWN}" == true ]]; then
    echo "Stopped early. Completed hosts logged to ${WORK_DIR}/completed.txt" >&2
    exit 130  # Convention: 128 + signal number (SIGINT=2)
fi

Remember: Exit code convention for signals: 128 + signal number. SIGINT (2) = exit 130. SIGTERM (15) = exit 143. This is how parent processes (including systemd and Kubernetes) know why a child exited.


Part 11: Debugging — Beyond echo

set -x and custom PS4

# Basic trace
set -x
# Every command is printed before execution, prefixed with +

# Better: custom PS4 shows file, line, and function
export PS4='+(${BASH_SOURCE[0]}:${LINENO}): ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'
set -x

Output becomes:

+(deploy.sh:47): main(): curl -sf https://api.internal/health
+(deploy.sh:48): main(): [[ 200 == 200 ]]

The DEBUG trap

trap 'echo "TRACE: ${BASH_COMMAND}" >&2' DEBUG

Fires before every simple command. Combined with BASH_COMMAND (the about-to-execute command), you get a trace without set -x noise. The bashdb project uses this mechanism to implement a full step-through debugger entirely in Bash.

Syntax check without running

bash -n deploy.sh

Catches syntax errors (unmatched quotes, bad if/fi pairing) without executing anything. Fast, safe, belongs in your CI pipeline.

ShellCheck — the lint that catches what you miss

shellcheck -s bash deploy.sh

ShellCheck is not a style checker. It catches real bugs: unquoted variables (SC2086), unused variables (SC2034), local masking exit codes (SC2155), POSIX compatibility issues, and dozens more. If you write Bash for production and don't use ShellCheck, you're flying blind.


Part 12: Performance — Why Your Read Loop Is Slow

The fork problem

Every external command (sed, awk, grep, basename, cut) creates a new process. fork() + exec() is cheap for one call. It's devastating in a loop.

Before (slow — 10,000 forks):

while IFS= read -r filepath; do
    filename=$(basename "${filepath}")
    extension=$(echo "${filepath}" | sed 's/.*\.//')
    echo "${filename} has extension ${extension}"
done < file-list.txt

After (fast — zero forks):

while IFS= read -r filepath; do
    filename="${filepath##*/}"
    extension="${filepath##*.}"
    echo "${filename} has extension ${extension}"
done < file-list.txt

mapfile/readarray — bulk read

# Instead of a while-read loop to build an array:
mapfile -t HOSTS < hosts.txt

# With a command:
mapfile -t ERRORS < <(grep ERROR /var/log/app.log)

# Now ERRORS is a proper array
echo "Found ${#ERRORS[@]} errors"

mapfile (alias readarray, added in Bash 4.0) reads lines into an array in a single operation. No loop, no subshell, no line-by-line overhead.

printf vs echo

# echo is non-portable and has gotchas:
echo -e "tab\there"    # Works in bash, not in some sh implementations
echo -n "no newline"   # Works in bash, not in all echo implementations

# printf is POSIX, predictable, and faster for formatted output:
printf 'tab\there\n'
printf '%s\n' "no newline issues"
printf '%-20s %s\n' "${host}" "${status}"  # Formatted table

Under the Hood: echo behavior varies between shells, between versions of the same shell, and even between builds (depending on compile-time options like --enable-xpg-echo). printf is specified by POSIX and behaves identically everywhere. In production scripts, use printf for anything more complex than a simple string.


Part 13: Anti-Patterns — Things That Work Until They Don't

Parsing ls

# BROKEN: fails on filenames with spaces, newlines, or special chars
for file in $(ls /var/log/*.log); do
    echo "Processing ${file}"
done

# CORRECT: use globbing directly
for file in /var/log/*.log; do
    [[ -e "${file}" ]] || continue  # Handle no-match case
    echo "Processing ${file}"
done

Remember: Set shopt -s nullglob to make globs expand to nothing (instead of the literal pattern) when no files match. Without it, /var/log/*.xyz iterates once with the literal string "/var/log/*.xyz" as the value.

Testing with == inside [ ]

# Non-portable: == is a bashism inside [ ]
[ "$status" == "active" ]  # Works in bash, fails in dash/sh

# POSIX-correct:
[ "$status" = "active" ]   # Single = inside [ ]

# Best: use [[ ]] in bash scripts
[[ "$status" == "active" ]]  # Safe, supports patterns and regex too

Unquoted variables

# The war story: an unquoted variable that deleted the wrong directory
# This script was supposed to clean up old deploy artifacts:
DEPLOY_DIR=""   # Bug: variable was empty due to failed config read
rm -rf ${DEPLOY_DIR}/releases/*
# Became: rm -rf /releases/*
# Which deleted /releases/ at the filesystem root

War Story: A variant of this bug hit the Valve Steam client in 2015. The script contained rm -rf "$STEAMROOT/"* where STEAMROOT could be empty, potentially evaluating to rm -rf "/"*. The fix was simple: rm -rf "${STEAMROOT:?}/"* — the :? expansion causes the script to abort with an error if the variable is unset or empty, instead of proceeding with the deletion. GitHub issue #3671 on ValveSoftware/steam-for-linux has the full discussion.

Useless cat

# Useless cat — adds a process for no reason
cat file.txt | grep pattern

# Direct redirect
grep pattern file.txt

# Or, when you need multiple commands:
grep pattern < file.txt

This matters in loops and pipelines. One unnecessary cat in a loop processing 10,000 files is 10,000 unnecessary forks.


Part 14: Portability — When Bash Isn't Bash

Bashisms that break on dash/sh

Debian and Ubuntu use dash (not bash) as /bin/sh. If your shebang says #!/bin/sh but your script uses Bash features, it will fail silently or with cryptic errors.

Bash feature POSIX sh equivalent
[[ ]] [ ] (with careful quoting)
(( )) arithmetic $((expr)) or expr
${var//pattern/replace} sed or expr
Arrays No equivalent (use positional parameters)
<(process substitution) Temp files
<<< here strings echo "$var" \| cmd
{1..10} brace expansion seq 1 10
[[ $x =~ regex ]] echo "$x" \| grep -qE 'regex'
# When invoked as sh, Bash enters POSIX-compatibility mode
# and disables many of its extensions. This dual personality
# has been both Bash's greatest strength and its most
# confusing source of portability bugs.

# Always use the correct shebang:
#!/usr/bin/env bash    # For bash scripts (finds bash in PATH)
#!/bin/sh              # ONLY for POSIX-compatible scripts

Trivia: Bash deliberately extends POSIX sh syntax. When invoked as sh (via a symlink), Bash enters POSIX-compatibility mode that disables [[, (( )), arrays, brace expansion, and more. This means the same binary behaves differently depending on what name it was called with — argv[0] determines the feature set.


Flashcard Check: Anti-Patterns and Portability

Question Answer
Why shouldn't you parse ls output? Filenames with spaces/newlines/special chars break the pipeline
What's wrong with [ "$x" == "y" ]? == inside [ ] is a bashism; use = for POSIX compatibility
${DEPLOY_DIR:?must be set} — what does it do? Exits with error message if DEPLOY_DIR is unset or empty
Name a Bash feature that doesn't work in dash Any of: [[, (( )), arrays, process substitution, <<<

Part 15: Real Patterns — Production Recipes

Retry with exponential backoff

retry() {
    local max_attempts=$1; shift
    local delay=$1; shift
    local attempt=1

    while (( attempt <= max_attempts )); do
        if "$@"; then
            return 0
        fi
        echo "Attempt ${attempt}/${max_attempts} failed. Retrying in ${delay}s..." >&2
        sleep "${delay}"
        delay=$(( delay * 2 ))
        (( attempt++ ))
    done

    echo "All ${max_attempts} attempts failed" >&2
    return 1
}

# Usage: retry <max_attempts> <initial_delay_seconds> <command...>
retry 5 2 curl -sf "https://api.internal/health"

Parallel execution with controlled concurrency

MAX_PARALLEL=10

for host in "${HOSTS[@]}"; do
    deploy_to_host "${host}" &

    # Throttle: if we've hit the limit, wait for one to finish
    if (( $(jobs -r | wc -l) >= MAX_PARALLEL )); then
        wait -n  # Wait for any one job to finish (Bash 4.3+)
    fi
done
wait  # Wait for all remaining jobs

echo "All deployments complete"

Safe temp file management

WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/deploy.XXXXXXXXXX")
trap 'rm -rf "${WORK_DIR}"' EXIT

# All temp files go in WORK_DIR — one cleanup handles everything
DEPLOY_LOG="${WORK_DIR}/deploy.log"
HOST_LIST="${WORK_DIR}/hosts.txt"
RESULTS="${WORK_DIR}/results.json"

Exercises

Exercise 1: Fix the broken deploy script (5 minutes)

This script has 5 bugs. Find and fix them all:

#!/bin/sh
# deploy.sh - Deploy to all hosts

DEPLOY_DIR=$DEPLOY_BASE/releases
HOSTS=$(cat hosts.txt)

for host in $HOSTS; do
    result=$(ssh $host "systemctl restart app")
    if [ $result == "ok" ]; then
        echo "Success: $host"
    fi
done

rm -rf $DEPLOY_DIR/old/*
Bugs and fixes 1. `#!/bin/sh` but script may use bashisms — use `#!/usr/bin/env bash` 2. No `set -euo pipefail` 3. `$DEPLOY_DIR` unquoted and `$DEPLOY_BASE` might be empty — `rm -rf` risk 4. `$HOSTS` unquoted — word splitting breaks on hostnames with special chars 5. `[ $result == "ok" ]` — unquoted `$result` and `==` inside `[ ]` Fixed version:
#!/usr/bin/env bash
set -euo pipefail

DEPLOY_DIR="${DEPLOY_BASE:?DEPLOY_BASE must be set}/releases"
mapfile -t HOSTS < hosts.txt

for host in "${HOSTS[@]}"; do
    if ssh "${host}" "systemctl restart app"; then
        echo "Success: ${host}"
    fi
done

rm -rf "${DEPLOY_DIR:?}/old/"*

Exercise 2: Write a locking wrapper (10 minutes)

Write a function with_lock that takes a lock file path and a command, runs the command with flock, and returns the command's exit code. It should fail fast if the lock is held.

Solution
with_lock() {
    local lockfile=$1; shift
    local fd
    exec {fd}>"${lockfile}"

    if ! flock -n "${fd}"; then
        echo "Lock held: ${lockfile}" >&2
        return 1
    fi

    "$@"
    local rc=$?

    # fd closes automatically when function returns (if using exec {fd}>)
    return "${rc}"
}

# Usage:
with_lock /var/run/deploy.lock deploy_to_all_hosts

Exercise 3: Build a retry-with-logging wrapper (15 minutes)

Combine the retry pattern from Part 15 with the structured logging pattern. The wrapper should log each attempt, the delay before retry, and the final success/failure. Use an associative array to track attempt timestamps and results.

Hint Use `declare -A ATTEMPT_LOG`, `date +%s` for timestamps, and the `log` function pattern from the primer. The retry function should populate the associative array as it goes.

Cheat Sheet

Strict mode

Setting What it does When it bites
set -e Exit on non-zero return Silent in if, &&, \|\|, local x=$(...)
set -u Exit on unset variable Breaks $@ in Bash <4.4; use ${VAR:-} for optional vars
set -o pipefail Pipe fails on first error Changes exit code of grep \| wc pipelines
IFS=$'\n\t' Word-split on newline/tab only Doesn't replace quoting

Parameter expansion

Syntax Result (for f="/a/b.tar.gz") Replaces
${f##*/} b.tar.gz basename
${f%/*} /a dirname
${f%.*} /a/b.tar Remove last extension
${f%%.*} /a/b Remove all extensions
${f##*.} gz Get extension
${f/tar/zip} /a/b.zip.gz sed substitution
${f^^} /A/B.TAR.GZ tr [:lower:] [:upper:]
${#f} 13 wc -c

Traps

Trap Fires when Common use
EXIT Script exits (any reason except SIGKILL) Temp file cleanup
ERR Command fails (same exceptions as set -e) Error logging
INT Ctrl+C (SIGINT) Graceful shutdown
TERM SIGTERM received Graceful shutdown
DEBUG Before every command Tracing, profiling

Locking

Method Atomic? Auto-release on crash? Blocking mode?
test -f && touch No (race condition) No No
mkdir Yes No No
flock Yes Yes (fd closes) Yes

Debugging

Tool Command What it shows
Trace set -x Every command before execution
Custom trace PS4='+(${BASH_SOURCE}:${LINENO}): ' File:line in trace output
Syntax check bash -n script.sh Parse errors without running
Lint shellcheck -s bash script.sh Real bugs, not just style
Profiling trap 'echo "$(date +%s.%N) $BASH_COMMAND"' DEBUG Per-command timestamps

Takeaways

  • set -euo pipefail is necessary but not sufficient. Know the six contexts where -e is ignored, especially local x=$(cmd). Separate declaration from assignment.

  • trap cleanup EXIT is non-negotiable. Every script that creates temp files or acquires locks needs it. The EXIT trap fires on normal exit, errors, SIGINT, and SIGTERM.

  • Parameter expansion replaces five commands. ${f##*/}, ${f%.*}, ${f:-default} — learn these and your scripts get faster and cleaner overnight.

  • Pipes create subshells; redirections don't. If a variable set inside a while read loop disappears, you piped into the loop. Use < file or < <(cmd) instead.

  • Use flock for cron jobs. One line in crontab (flock -n /path/to/lock command) prevents the entire class of "script ran twice" bugs.

  • ShellCheck isn't optional. It catches bugs that experts miss. Run it in CI.