Skip to content

Advanced Bash for Ops - Street-Level Ops

Real-world patterns and debugging techniques for production Bash scripting.

Quick Diagnosis Commands

# Debug a script without modifying it
bash -x ./myscript.sh              # Trace every command
bash -xv ./myscript.sh             # Trace + print each line before expansion
PS4='+(${BASH_SOURCE}:${LINENO}): ' bash -x ./myscript.sh  # Show file:line in trace

# Check syntax without running
bash -n ./myscript.sh

# Find which shell is actually running
readlink -f /proc/$$/exe
echo "${BASH_VERSION}"

# Profile script execution time
time ./myscript.sh
# Per-command timing via DEBUG trap
trap 'echo "$(date +%s.%N) $BASH_COMMAND"' DEBUG

Gotcha: Unquoted Variables

The single most common Bash bug in production:

# BROKEN: filename with spaces causes word splitting
file="/var/log/my app.log"
rm $file    # Runs: rm /var/log/my app.log (two arguments!)

# FIXED
rm "${file}"

Rule: Always double-quote variable expansions. The only exceptions are inside [[ ]] and on the right side of assignments.

Gotcha: Subshells Swallowing Variables

Pipes create subshells. Variables set in a subshell don't propagate:

# BROKEN: count stays 0
count=0
cat hosts.txt | while read -r host; do
    count=$(( count + 1 ))
done
echo "${count}"  # Still 0!

# FIXED: redirect instead of pipe
count=0
while read -r host; do
    count=$(( count + 1 ))
done < hosts.txt
echo "${count}"  # Correct

Gotcha: set -e and Command Substitution

Under the hood: local is itself a command, and its exit code masks the exit code of the assignment. When Bash evaluates local result=$(failing_command), the failing command returns non-zero, but local succeeds (it created the variable), so set -e sees a zero exit code and continues. This is one of the most common set -e surprises.

set -e doesn't catch failures inside $() in all contexts:

# This WON'T exit on failure
result=$(failing_command)  # Exits due to set -e ✓

# This WON'T exit — assignment to local suppresses exit
local result=$(failing_command)  # Bug! Exit code masked by 'local'

# FIXED: separate declaration and assignment
local result
result=$(failing_command)

Gotcha: Globbing in Variables

# DANGEROUS: variable contains glob chars
pattern="*.log"
rm ${pattern}  # Expands to all .log files in CWD, not the literal string

# If you want literal glob:
rm -- "${pattern}"  # Still globs! Only noglob prevents expansion

# For controlled globbing:
shopt -s nullglob  # Empty result instead of literal glob when no match
for f in /var/log/*.log; do
    echo "Processing: ${f}"
done

Pattern: Inventory-Driven Operations

# Parse a simple inventory file with groups
declare -A HOST_GROUPS
current_group="default"

while IFS= read -r line; do
    line="${line%%#*}"          # Strip comments
    line="${line// /}"          # Strip whitespace
    [[ -z "${line}" ]] && continue

    if [[ "${line}" == \[*\] ]]; then
        current_group="${line//[\[\]]/}"
        continue
    fi
    HOST_GROUPS["${current_group}"]+="${line} "
done < inventory.ini

# Use specific group
for host in ${HOST_GROUPS[webservers]}; do
    ssh "${host}" 'uptime'
done

Pattern: Progress Reporting

total=${#HOSTS[@]}
completed=0
failed=0

report_progress() {
    local pct=$(( completed * 100 / total ))
    printf '\r[%3d%%] %d/%d done, %d failed' \
        "${pct}" "${completed}" "${total}" "${failed}"
}

for host in "${HOSTS[@]}"; do
    if ssh -o ConnectTimeout=5 "${host}" 'uptime' &>/dev/null; then
        (( completed++ ))
    else
        (( failed++ ))
        (( completed++ ))
        FAILED_HOSTS+=("${host}")
    fi
    report_progress
done
echo  # Newline after progress

Pattern: Safe Temp Files

# Always use mktemp, never hardcoded paths
WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/fleet-ops.XXXXXXXXXX")
trap 'rm -rf "${WORK_DIR}"' EXIT

# Per-host output files
for host in "${HOSTS[@]}"; do
    ssh "${host}" 'df -h' > "${WORK_DIR}/${host}.out" 2>&1
done

# Aggregate results
cat "${WORK_DIR}"/*.out | sort -k5 -rn | head -20

Pattern: SSH Multiplexing for Fleet Operations

# Set up a control socket for reuse
SSH_OPTS=(
    -o ConnectTimeout=5
    -o StrictHostKeyChecking=accept-new
    -o ControlMaster=auto
    -o ControlPath="/tmp/ssh-%r@%h:%p"
    -o ControlPersist=300
)

# First connection establishes the master
ssh "${SSH_OPTS[@]}" "${host}" 'uptime'

# Subsequent connections reuse the TCP connection (near-instant)
ssh "${SSH_OPTS[@]}" "${host}" 'free -m'
ssh "${SSH_OPTS[@]}" "${host}" 'df -h /'

Pattern: Rolling Operations with Circuit Breaker

MAX_FAILURES=3
failure_count=0

for host in "${HOSTS[@]}"; do
    if (( failure_count >= MAX_FAILURES )); then
        log ERROR "Circuit breaker: ${MAX_FAILURES} consecutive failures. Halting."
        log ERROR "Remaining hosts not processed: $(( ${#HOSTS[@]} - completed ))"
        exit ${E_PARTIAL}
    fi

    if process_host "${host}"; then
        failure_count=0  # Reset on success
    else
        (( failure_count++ ))
        FAILED_HOSTS+=("${host}")
        log WARN "Failure ${failure_count}/${MAX_FAILURES} on ${host}"
    fi
done

Emergency: Debugging a Hung Script

# Find the script process
pgrep -f myscript.sh
ps auxf | grep myscript

# See what it's doing right now
strace -p <PID> -e trace=network,write
cat /proc/<PID>/wchan          # What kernel function it's waiting in
ls -la /proc/<PID>/fd/         # What files/sockets it has open

# See the script's environment
cat /proc/<PID>/environ | tr '\0' '\n'

# Send it a signal to trigger the trap handler
kill -USR1 <PID>

Emergency: Recovering from a Botched Fleet Script

# 1. Kill the runaway process
pkill -f fleet-patch.sh

# 2. Check which hosts were touched
grep 'Processing' /var/log/fleet-ops.log | tail -50

# 3. Identify partial state
for host in $(cat /var/log/fleet-ops.log | grep FAILED | awk '{print $NF}'); do
    echo "--- ${host} ---"
    ssh "${host}" 'systemctl status nginx; rpm -q nginx' 2>&1
done

# 4. Clean up lock files
rm -rf /var/run/fleet-patch.lock

Useful One-Liners for Ops

# Find large files modified in last 24h
find / -xdev -type f -mtime -1 -size +100M -printf '%s %p\n' 2>/dev/null | sort -rn

# Watch for OOM kills in real-time
dmesg -wH | grep -i 'oom\|killed process'

# Quick disk usage sorted by size
du -xsh /* 2>/dev/null | sort -rh | head -15

# Check all systemd units in failed state
systemctl list-units --state=failed --no-legend --no-pager

# Parallel ping sweep
printf '%s\n' 10.0.1.{1..254} | xargs -P 50 -I{} sh -c 'ping -c1 -W1 {} &>/dev/null && echo "{} up"'

Power One-Liners

Bash tricks, shortcuts, and patterns that separate beginners from power users.

Re-run last command as root

sudo !!

Breakdown: !! is bash history expansion for "the entire previous command." Bash expands it before execution, so sudo wraps whatever you just ran. The expanded command is echoed before running.

[!TIP] When to use: You ran a privileged command without sudo. Muscle memory saver.

Quick fix typo in previous command

^typo^fix

Breakdown: Bash ^old^new substitution — shorthand for !!:s/old/new/. Only replaces the first occurrence. For global replace: !!:gs/old/new/.

[!TIP] When to use: Fat-fingered a path, hostname, or flag. Faster than up-arrow and editing.

Open $EDITOR for complex command composition

ctrl-x ctrl-e

Breakdown: Readline shortcut that opens $EDITOR (or $VISUAL) with the current command line. Write your multi-line pipeline, save and quit — it executes. Works in bash and zsh.

[!TIP] When to use: Building complex pipelines, writing inline awk scripts, anything longer than ~80 chars.

Cycle through previous command arguments

ALT+.    # or ESC .

Breakdown: Inserts the last argument of the previous command. Press repeatedly to cycle through last arguments of earlier commands. ESC . works when ALT is captured by terminal emulator.

[!TIP] When to use: Reusing file paths, hostnames, or long arguments across sequential commands.

Kill/yank line for mid-command research

ctrl-u  # ...check something... then:
ctrl-y  # paste it back

Breakdown: ctrl-u kills (cuts) from cursor to beginning of line into the kill ring. ctrl-y yanks (pastes) it back. The line is preserved even after running other commands in between.

[!TIP] When to use: Halfway through typing a command and need to check a path, PID, or hostname first.

Most frequently used commands

history | awk '{a[$2]++} END {for(i in a) print a[i], i}' | sort -rn | head -20

Breakdown: Classic awk frequency counter on field 2 (the command name from history output). Reveals your actual workflow patterns — useful for identifying alias candidates.

[!TIP] When to use: Optimizing your shell workflow, creating aliases, identifying automation candidates.

Brace expansion for quick rename/backup

mv config.yaml{,.bak}        # config.yaml -> config.yaml.bak
cp app.py{,.$(date +%s)}     # timestamped backup
mkdir -p project/{src,tests,docs,scripts}  # scaffold dirs

Breakdown: Bash expands {a,b} before the command runs. Empty element in {,.bak} means the original name is the first arg, .bak appended is the second. Works with any command.

[!TIP] When to use: Quick backups before editing, project scaffolding, batch file operations.

Process substitution for comparing outputs

diff <(sort file1) <(sort file2)
diff <(ssh host1 cat /etc/config) <(ssh host2 cat /etc/config)
diff <(cd dir1 && find | sort) <(cd dir2 && find | sort)

Breakdown: <(cmd) creates a temporary file descriptor containing the command's stdout. Lets you diff, comm, or paste outputs of two commands without temp files. Works in bash/zsh, not sh.

[!TIP] When to use: Comparing config across hosts, validating deployments, diffing directory trees.

Delete files NOT matching a pattern

shopt -s extglob
rm !(*.log|*.conf|*.yaml)

Breakdown: extglob enables extended pattern matching. !(pattern) matches everything except the pattern. Without extglob, bash doesn't support negation in globs.

[!TIP] When to use: Cleaning up a directory while preserving specific file types.

Quick directory stack navigation

pushd /var/log    # push current dir, cd to /var/log
pushd /etc        # push /var/log, cd to /etc
popd              # back to /var/log
dirs -v           # show numbered stack

Breakdown: pushd maintains a LIFO stack of directories. popd pops and cd's. dirs -v shows the stack with indices. cd ~2 jumps to stack position 2. Vastly superior to cd - for multi-dir workflows.

[!TIP] When to use: Bouncing between log dirs, config dirs, and source dirs during debugging.

Timestamped history for forensics

export HISTTIMEFORMAT="%F %T "

Breakdown: Sets history to record timestamps. %F = YYYY-MM-DD, %T = HH:MM:SS. Persists across sessions if added to .bashrc. Makes history output include when each command was run.

[!TIP] When to use: Post-incident forensics — "what commands were run and when?" Also useful for compliance auditing.

Close shell keeping background jobs alive

disown -a && exit

Breakdown: disown -a removes ALL jobs from the shell's job table so they won't receive SIGHUP when the shell exits. Then exit closes cleanly. Alternative: start with nohup or use tmux/screen.

[!TIP] When to use: Started a long-running process (build, transfer, migration) and need to disconnect.

Stream fan-out — one pipeline, many consumers

some_command | tee >(gzip > compressed.gz) >(sha256sum > checksum.txt) >(wc -l > linecount.txt) | head

Breakdown: tee copies stdin to each >(cmd) process substitution while still passing the original stream to stdout for the final pipeline stage. Each >(cmd) runs in its own subshell concurrently. This is shell dataflow programming — you're building a mini DAG in one line.

[!TIP] When to use: Simultaneously compress, hash, count, and preview a data stream. Also: writing to multiple log sinks, fan-out to parallel processors, splitting CI build artifacts.

Caveat: >(cmd) runs asynchronously — the main pipeline may finish before all branches complete. For critical writes, add wait after.

Compare state over time (diff as a time machine)

diff <(lsof -p 1234) <(sleep 10; lsof -p 1234)
diff <(ss -tn) <(sleep 30; ss -tn)
diff <(env) <(sleep 5; env)

Breakdown: First process substitution captures state now. Second captures state after a delay. diff shows what changed. Works with anything that produces text: ps, ss, netstat, iptables -L, mount, env.

[!TIP] When to use: "Something is leaking file descriptors / connections / memory." Snapshot, wait, snapshot, diff. The poor man's profiler.

Duplicate a disk to multiple targets simultaneously

dd if=/dev/sda bs=4M | tee >(dd of=/dev/sdb bs=4M) | dd of=/dev/sdc bs=4M

Breakdown: dd reads the source once. tee + process substitution fans the stream to /dev/sdb while the main pipeline writes to /dev/sdc. One read pass, two write targets, concurrent.

[!TIP] When to use: Cloning a golden image to multiple drives, preparing identical servers.

Caveat: One typo in the of= targets and you're speedrunning regret. Triple-check device names with lsblk first.


Quick Reference