Portal | Level: L1: Foundations | Topics: Bash / Shell Scripting, Linux Fundamentals | Domain: Linux

Advanced Bash for Ops - Primer¶

Why This Matters¶

Bash is the lingua franca of infrastructure. Every server has it. Every CI pipeline runs it. Every runbook assumes it. Most DevOps engineers can write a basic script, but production Bash — the kind that runs unattended at 3 AM across 1,500 servers — requires discipline that casual scripting never teaches. Bad Bash is the #1 source of self-inflicted outages in operations teams.

Core Principles¶

1. Defensive Defaults¶

Every production script should start with strict mode:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

Flag	Effect
`-e`	Exit immediately on non-zero return
`-u`	Treat unset variables as errors
`-o pipefail`	Pipe fails if any command in the chain fails

Without these, a script can silently fail halfway through and continue executing destructive operations on stale state.

Remember: Mnemonic for set -euo pipefail: "E-U-P: Exit on errors, Unset vars are errors, Pipes fail properly." Think of it as the safety harness for production bash. Some teams enforce it via shellcheck rules (SC2086, SC2154).

2. Trap Handlers¶

Clean up resources when scripts exit — whether normally, on error, or on signal:

TMPDIR=$(mktemp -d)
LOCKFILE="/var/run/myprocess.lock"

cleanup() {
    rm -rf "${TMPDIR}"
    rm -f "${LOCKFILE}"
    echo "Cleaned up at $(date)" >> /var/log/myscript.log
}
trap cleanup EXIT

on_error() {
    local line=$1
    echo "ERROR: Script failed at line ${line}" >&2
    # Send alert, log to syslog, etc.
    logger -t myscript "FAILED at line ${line}"
}
trap 'on_error ${LINENO}' ERR

3. Lock Files¶

Prevent concurrent execution when scripts modify shared state:

LOCKFILE="/var/run/fleet-patch.lock"

acquire_lock() {
    if ! mkdir "${LOCKFILE}" 2>/dev/null; then
        local pid
        pid=$(cat "${LOCKFILE}/pid" 2>/dev/null || echo "unknown")
        echo "Lock held by PID ${pid}. Exiting." >&2
        exit 1
    fi
    echo $$ > "${LOCKFILE}/pid"
}

release_lock() {
    rm -rf "${LOCKFILE}"
}

acquire_lock
trap release_lock EXIT

Under the hood: mkdir is atomic on all POSIX filesystems because the kernel creates the directory in a single syscall that either succeeds or fails -- there's no race window. This makes it a reliable cross-platform lock primitive. The flock command uses the flock(2) syscall, which is kernel-level advisory locking -- faster and more robust, but not available on all systems (notably missing on some NFS mounts).

Using mkdir for locking is atomic on all filesystems. Using flock is better when available:

exec 200>/var/run/myscript.lock
flock -n 200 || { echo "Already running"; exit 1; }

Structured Logging¶

Production scripts need parseable output, not ad-hoc echo statements:

readonly LOG_FILE="/var/log/fleet-ops.log"
readonly SCRIPT_NAME=$(basename "$0")

log() {
    local level=$1; shift
    local msg="$*"
    local ts
    ts=$(date -u '+%Y-%m-%dT%H:%M:%SZ')
    printf '%s [%s] %s: %s\n' "${ts}" "${level}" "${SCRIPT_NAME}" "${msg}" | tee -a "${LOG_FILE}"
}

log INFO "Starting fleet patch cycle"
log WARN "Host db-03 unreachable, skipping"
log ERROR "Patch failed on web-12: exit code 137"

Argument Parsing¶

Use getopts for simple flags, or a manual loop for long options:

usage() {
    cat <<EOF
Usage: ${0##*/} [-n] [-v] [-t TIMEOUT] [-h] HOST_PATTERN
  -n          Dry run (no changes)
  -v          Verbose output
  -t TIMEOUT  SSH timeout in seconds (default: 10)
  -h          Show this help
EOF
    exit 1
}

DRY_RUN=false
VERBOSE=false
TIMEOUT=10

while getopts ":nvt:h" opt; do
    case ${opt} in
        n) DRY_RUN=true ;;
        v) VERBOSE=true ;;
        t) TIMEOUT=${OPTARG} ;;
        h) usage ;;
        :) echo "Option -${OPTARG} requires an argument" >&2; usage ;;
        \?) echo "Unknown option -${OPTARG}" >&2; usage ;;
    esac
done
shift $((OPTIND - 1))

[[ $# -lt 1 ]] && { echo "HOST_PATTERN required" >&2; usage; }
HOST_PATTERN=$1

Arrays and Iteration¶

Bash arrays are essential for handling lists of hosts, files, or arguments safely:

# Declare arrays
declare -a HOSTS=()
declare -a FAILED=()
declare -a SKIPPED=()

# Build host list from inventory
while IFS= read -r host; do
    [[ -z "${host}" || "${host}" == \#* ]] && continue
    HOSTS+=("${host}")
done < /etc/fleet/inventory.txt

# Iterate with index
for i in "${!HOSTS[@]}"; do
    host="${HOSTS[$i]}"
    echo "[${i}/${#HOSTS[@]}] Processing ${host}..."
done

# Report
echo "Total: ${#HOSTS[@]}  Failed: ${#FAILED[@]}  Skipped: ${#SKIPPED[@]}"

Process Substitution and File Descriptors¶

# Compare two remote file listings without temp files
diff <(ssh host1 'ls /etc/configs/') <(ssh host2 'ls /etc/configs/')

# Redirect stdout and stderr to different files
exec 1>>/var/log/myscript.out
exec 2>>/var/log/myscript.err

# Tee to both log and stdout using fd 3
exec 3>&1
exec 1> >(tee -a /var/log/myscript.log >&3)

String Manipulation¶

Bash built-in string operations avoid forking to sed/awk for simple tasks:

# Parameter expansion
filename="/path/to/config.yaml.bak"
echo "${filename##*/}"        # config.yaml.bak  (basename)
echo "${filename%.*}"         # /path/to/config.yaml  (remove extension)
echo "${filename%%.*}"        # /path/to/config  (remove all extensions)
echo "${filename/yaml/json}"  # /path/to/config.json.bak  (substitution)

# Default values
DB_HOST="${DB_HOST:-localhost}"
DB_PORT="${DB_PORT:=5432}"     # Also assigns if unset

# Length
echo "${#filename}"            # 25

Exit Codes as API¶

Define meaningful exit codes so callers can react programmatically:

readonly E_SUCCESS=0
readonly E_USAGE=1
readonly E_LOCK=2
readonly E_SSH=3
readonly E_TIMEOUT=4
readonly E_PARTIAL=5    # Some hosts succeeded, some failed

main() {
    # ... script logic ...
    if [[ ${#FAILED[@]} -gt 0 && ${#SUCCEEDED[@]} -gt 0 ]]; then
        exit ${E_PARTIAL}
    elif [[ ${#FAILED[@]} -gt 0 ]]; then
        exit ${E_SSH}
    fi
    exit ${E_SUCCESS}
}

Common Patterns¶

Retry with Backoff¶

retry() {
    local max_attempts=$1; shift
    local delay=$1; shift
    local attempt=1

    while (( attempt <= max_attempts )); do
        if "$@"; then
            return 0
        fi
        echo "Attempt ${attempt}/${max_attempts} failed. Retrying in ${delay}s..." >&2
        sleep "${delay}"
        delay=$(( delay * 2 ))
        attempt=$(( attempt + 1 ))
    done
    return 1
}

retry 3 5 ssh "${host}" 'systemctl restart nginx'

Parallel Execution with Controlled Concurrency¶

MAX_PARALLEL=10

run_parallel() {
    local -a pids=()
    for host in "${HOSTS[@]}"; do
        process_host "${host}" &
        pids+=($!)

        # Throttle
        if (( ${#pids[@]} >= MAX_PARALLEL )); then
            wait -n  # Wait for any one to finish (bash 4.3+)
            # Clean up finished pids
            local -a active=()
            for pid in "${pids[@]}"; do
                kill -0 "${pid}" 2>/dev/null && active+=("${pid}")
            done
            pids=("${active[@]}")
        fi
    done
    wait  # Wait for remaining
}

Here-Doc for Remote Commands¶

ssh -o ConnectTimeout=10 "${host}" bash -s <<'REMOTE'
    set -euo pipefail
    echo "Running on $(hostname)"
    systemctl status nginx
    df -h /var/log
REMOTE

Note: single-quoting 'REMOTE' prevents local variable expansion. Remove quotes to allow it.

Gotcha: set -e has a subtle trap with subshells and command substitution. result=$(failing_command) will trigger the set -e exit. But failing_command | other_command will NOT exit (unless set -o pipefail is also set), and if failing_command; then will NOT exit because the command is in a condition context. These exceptions trip even experienced scripters.

Testing Bash Scripts¶

ShellCheck¶

Always run shellcheck on production scripts:

shellcheck -s bash myscript.sh

ShellCheck catches quoting issues, unused variables, POSIX compatibility problems, and common logic errors.

BATS (Bash Automated Testing)¶

#!/usr/bin/env bats

@test "lock file prevents concurrent runs" {
    mkdir /var/run/fleet-patch.lock
    run ./fleet-patch.sh
    [ "$status" -eq 2 ]
    [[ "$output" == *"Lock held"* ]]
    rmdir /var/run/fleet-patch.lock
}

@test "dry run makes no changes" {
    run ./fleet-patch.sh -n web-01
    [ "$status" -eq 0 ]
    [[ "$output" == *"DRY RUN"* ]]
}

The Deeper Patterns Behind Power One-Liners¶

The brilliant one-liners are not about memorizing syntax — they are about recognizing composable patterns. Once you see these patterns, you can improvise solutions to problems you have never seen before.

Pattern	What it means	Example
Process substitution	Treat command output like a file	`diff <(sort a) <(sort b)`
Stream fan-out	One stream, many consumers	`tee >(cmd1) >(cmd2)`
State normalization	Convert messy reality into plain text	`find \\| sort`, `lsof`, `ps`, `ss`
Zero-temp-file transport	Stream instead of save-then-move	`tar \\| ssh \\| tar`
Incremental narrowing	Cheap filter before expensive work	Size-first duplicate finder
Time-sliced inspection	Compare now vs later	`diff <(cmd) <(sleep 10; cmd)`
FIFO loop closure	Named pipe turns linear pipes into circuits	`mkfifo \\| nc \\| tee \\| nc > fifo`

The key insight: the shell stops being "command runner" and starts acting like a tiny dataflow language. Every power one-liner is really converting a problem into text streams, then composing tiny tools. That is the Unix philosophy in one sentence.

Name origin: Bash stands for "Bourne Again SHell" -- a pun on "born again" and the Bourne Shell (sh) created by Stephen Bourne at Bell Labs in 1979. Brian Fox wrote Bash for the GNU Project in 1989. The current maintainer is Chet Ramey, who has maintained it since the early 1990s.

Prerequisites¶

Linux Ops (Topic Pack, L0)

Next Steps¶

Regex & Text Wrangling (Topic Pack, L1)

Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Bash / Shell Scripting, Linux Fundamentals
Environment Variables (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
LPIC / LFCS Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting, Linux Fundamentals
Linux Ops (Topic Pack, L0) — Bash / Shell Scripting, Linux Fundamentals
Linux Ops Drills (Drill, L0) — Bash / Shell Scripting, Linux Fundamentals
Pipes & Redirection (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
Process Management (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
RHCE (EX294) Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting, Linux Fundamentals
Regex & Text Wrangling (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
Track: Foundations (Reference, L0) — Bash / Shell Scripting, Linux Fundamentals

Advanced Bash for Ops - Primer¶

Why This Matters¶

Core Principles¶

1. Defensive Defaults¶

2. Trap Handlers¶

3. Lock Files¶

Structured Logging¶

Argument Parsing¶

Arrays and Iteration¶

Process Substitution and File Descriptors¶

String Manipulation¶

Exit Codes as API¶

Common Patterns¶

Retry with Backoff¶

Parallel Execution with Controlled Concurrency¶

Here-Doc for Remote Commands¶

Testing Bash Scripts¶

ShellCheck¶

BATS (Bash Automated Testing)¶

The Deeper Patterns Behind Power One-Liners¶

Wiki Navigation¶

Prerequisites¶

Next Steps¶

Pages that link here¶

Advanced Bash for Ops - Primer¶

Why This Matters¶

Core Principles¶

1. Defensive Defaults¶

2. Trap Handlers¶

3. Lock Files¶

Structured Logging¶

Argument Parsing¶

Arrays and Iteration¶

Process Substitution and File Descriptors¶

String Manipulation¶

Exit Codes as API¶

Common Patterns¶

Retry with Backoff¶

Parallel Execution with Controlled Concurrency¶

Here-Doc for Remote Commands¶

Testing Bash Scripts¶

ShellCheck¶

BATS (Bash Automated Testing)¶

The Deeper Patterns Behind Power One-Liners¶

Wiki Navigation¶

Prerequisites¶

Next Steps¶

Related Content¶

Pages that link here¶