Pipes & Redirection - Street-Level Ops¶

Real-world patterns for wrangling output, building data pipelines, and debugging I/O in production.

Logging Command Output¶

# Capture both stdout and stderr to a log while still seeing output
deploy.sh 2>&1 | tee -a /var/log/deploy.log

# Timestamp every line of output
command 2>&1 | while IFS= read -r line; do
    printf '%s %s\n' "$(date '+%Y-%m-%dT%H:%M:%S')" "${line}"
done | tee -a timestamped.log

# Log a script's entire output automatically
exec > >(tee -a /var/log/myscript.log) 2>&1
echo "This goes to both terminal and log file"
do_work
echo "So does this"

Separating stdout and stderr¶

# Send stdout and stderr to different files
make build > build_stdout.log 2> build_stderr.log

# Process stdout normally, capture stderr separately
result=$(command 2>/tmp/errors.txt)
if [[ -s /tmp/errors.txt ]]; then
    echo "Errors occurred:"
    cat /tmp/errors.txt
fi

# Show errors on screen, redirect successful output to file
command > output.txt     # stdout to file, stderr still on screen

# Swap stdout and stderr (send stdout to stderr and vice versa)
# Under the hood: fd 3 saves stdout, stdout takes stderr, stderr takes saved stdout
command 3>&1 1>&2 2>&3 3>&-

# Capture stderr into a variable while letting stdout pass through
errors=$(command 2>&1 >/dev/tty)

Building Data Pipelines¶

# CSV processing: extract column 3, sort, deduplicate, count
cut -d',' -f3 users.csv | sort | uniq -c | sort -rn | head -20

# Apache/nginx log analysis: top requested URLs
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -20

# Multi-stage processing with intermediate inspection
cat raw_data.csv \
    | tee /tmp/stage0_raw.txt \
    | grep -v '^#' \
    | tee /tmp/stage1_no_comments.txt \
    | awk -F',' '$3 > 100 {print $1, $3}' \
    | tee /tmp/stage2_filtered.txt \
    | sort -k2 -rn \
    | head -10

# JSON processing pipeline (with jq)
curl -s 'https://api.example.com/users' \
    | jq -r '.[] | [.name, .email, .last_login] | @csv' \
    | sort -t',' -k3 -r \
    | head -20

# Combine data from multiple sources
paste <(awk '{print $1}' file1.txt) <(awk '{print $3}' file2.txt) | column -t

Redirecting to Both File and Screen¶

# tee is the standard tool
long_running_command 2>&1 | tee output.log

# Write to multiple destinations
command | tee file1.log file2.log

# Append mode (do not truncate)
command | tee -a /var/log/persistent.log

# Split output: errors to one file, everything to another
command 2> >(tee -a errors.log >&2) | tee -a all_output.log

Background Process Output Management¶

# Redirect background process output to a file
long_task > /var/log/task.log 2>&1 &
task_pid=$!

# nohup: survive terminal close, output goes to nohup.out
nohup long_task &

# nohup with explicit output file
nohup long_task > /var/log/task.log 2>&1 &

# Discard all output from background process
long_task > /dev/null 2>&1 &

# Check if background process is still running
if kill -0 "${task_pid}" 2>/dev/null; then
    echo "Still running"
fi
wait "${task_pid}"
echo "Exit code: $?"

/dev/tcp for Network Testing¶

Gotcha: /dev/tcp is a bash-ism, not a real file on disk. Scripts running under sh, dash, or #!/bin/sh will fail with "No such file or directory." Always use #!/bin/bash explicitly when relying on this feature, and be aware it may be compiled out of bash in some distributions (Debian historically disabled it).

Bash (but not sh or dash) has built-in /dev/tcp and /dev/udp pseudo-devices for network I/O without external tools.

# Test if a port is open (replacement for nc/telnet)
if timeout 3 bash -c 'echo > /dev/tcp/db-server/5432' 2>/dev/null; then
    echo "PostgreSQL port is open"
else
    echo "Cannot reach PostgreSQL"
fi

# Fetch an HTTP response (no curl/wget needed)
exec 3<>/dev/tcp/example.com/80
echo -e "GET / HTTP/1.1\r\nHost: example.com\r\nConnection: close\r\n\r\n" >&3
cat <&3
exec 3>&-

# Check multiple services in a loop
for service in "web-01:80" "db-01:5432" "cache-01:6379" "mq-01:5672"; do
    host="${service%%:*}"
    port="${service##*:}"
    if timeout 2 bash -c "echo > /dev/tcp/${host}/${port}" 2>/dev/null; then
        printf "%-20s %s\n" "${service}" "OK"
    else
        printf "%-20s %s\n" "${service}" "FAIL"
    fi
done

Process Substitution for Diffing¶

# Compare running config vs saved config
diff <(nginx -T 2>/dev/null) /etc/nginx/nginx.conf.bak

# Compare package lists between two servers
diff <(ssh server1 "dpkg -l | awk '{print \$2, \$3}'") \
     <(ssh server2 "dpkg -l | awk '{print \$2, \$3}'")

# Compare environment variables between shells
diff <(env | sort) <(ssh remote-host env | sort)

# Find new entries in a log since last check
diff <(cat /var/log/last_check.txt) <(cat /var/log/auth.log) | grep '^>'

# Compare directory listings (find files that differ)
diff <(cd /opt/app-v1 && find . -type f | sort) \
     <(cd /opt/app-v2 && find . -type f | sort)

Pipeline Performance¶

Useless Use of Cat (UUOC)¶

cat file | command is functionally identical to command < file but creates an extra process and pipe buffer.

# Wasteful: cat starts a process just to pipe the file
cat access.log | grep "ERROR"

# Better: grep reads the file directly
grep "ERROR" access.log

# Wasteful: cat into awk
cat data.csv | awk -F',' '{print $3}'

# Better: awk reads the file
awk -F',' '{print $3}' data.csv

The performance difference is negligible for small files but measurable on large files or in tight loops. More importantly, command < file makes the data flow clearer.

Gotcha: In a pipeline, the exit code of the entire pipeline is the exit code of the LAST command, not the first. badcommand | grep pattern returns 0 if grep succeeds, even if badcommand failed. Use set -o pipefail in scripts to catch failures anywhere in the pipeline.

There is one legitimate use for cat at the start of a pipeline: when you want to emphasize that data is flowing through a pipeline, or when the command varies and you want consistent syntax.

Avoiding Unnecessary Pipelines¶

# Bad: three processes for a simple count
cat file | grep "pattern" | wc -l

# Better: one process
grep -c "pattern" file

# Bad: sort | uniq when sort -u exists
sort file | uniq

# Better:
sort -u file

# Bad: grep then awk (awk can match too)
grep "ERROR" file | awk '{print $3}'

# Better: awk does both
awk '/ERROR/ {print $3}' file

Redirecting in Loops¶

# Redirect all loop output to a file
for host in web-{01..10}; do
    echo "Checking ${host}..."
    ssh "${host}" "uptime" 2>/dev/null || echo "  FAILED"
done > health_report.txt 2>&1

# Append to a file inside a loop
while read -r host; do
    result=$(ssh "${host}" "df -h /" 2>/dev/null | tail -1)
    echo "${host}: ${result}" >> disk_report.txt
done < hosts.txt

# Read from one fd, write to another inside a loop
exec 3< hosts.txt
exec 4> results.txt
while read -r host <&3; do
    echo "$(ssh "${host}" uptime 2>/dev/null)" >&4
done
exec 3<&-
exec 4>&-

Parallel Processing with Pipes¶

Default trap: xargs -P and parallel interleave output from concurrent processes. If two jobs print at the same time, you get garbled lines. Use parallel --line-buffer or parallel --group to keep output from each job together. For xargs, redirect each job's output to a separate file.

# GNU parallel: process lines in parallel
cat urls.txt | parallel -j 8 'curl -sI {} | head -1'

# xargs parallel mode
cat hosts.txt | xargs -P 10 -I {} ssh {} "uptime"

# Background processes with output collection
for host in web-{01..05}; do
    ssh "${host}" "uptime" > "/tmp/uptime_${host}.txt" 2>&1 &
done
wait
cat /tmp/uptime_web-*.txt

# Process substitution for parallel pipelines
paste <(grep "ERROR" app.log | wc -l) \
      <(grep "WARN" app.log | wc -l) \
      <(grep "INFO" app.log | wc -l) \
    | awk '{printf "ERROR: %d  WARN: %d  INFO: %d\n", $1, $2, $3}'

Practical Recipes¶

Script that logs everything automatically¶

#!/usr/bin/env bash
set -euo pipefail

LOG="/var/log/myapp/$(date +%Y%m%d_%H%M%S).log"
exec > >(tee -a "${LOG}") 2>&1

echo "[$(date)] Starting deployment..."
# All output from here on goes to both screen and log

Safely overwrite a file in place¶

Under the hood: grep pattern file > file truncates the file to zero bytes BEFORE grep starts reading because the shell opens the output file (with truncation) before launching the command. By the time grep runs, the file is empty. This is a shell behavior, not a grep bug.

# WRONG: this truncates the file before grep reads it
grep -v "bad_line" file.txt > file.txt   # file.txt is now empty

# RIGHT: use a temp file
grep -v "bad_line" file.txt > file.txt.tmp && mv file.txt.tmp file.txt

# RIGHT: use sponge (from moreutils)
grep -v "bad_line" file.txt | sponge file.txt

# RIGHT: use sed for in-place editing
sed -i '/bad_line/d' file.txt

Progress indicator for long pipelines¶

# Use pv (pipe viewer) to show throughput
pv /var/log/huge.log | grep "ERROR" > errors.txt

# With compressed input
pv /var/log/huge.log.gz | zcat | grep "ERROR" > errors.txt

# Show progress as lines processed
cat /var/log/huge.log | pv -l | grep "ERROR" > errors.txt

Multiplexing output to multiple consumers¶

# tee with process substitution: send to three different processors
cat access.log \
    | tee >(grep "ERROR" > errors.log) \
          >(awk '{print $9}' | sort | uniq -c | sort -rn > status_codes.txt) \
          >(grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | sort -u > unique_ips.txt) \
    > /dev/null