Portal | Level: L1: Foundations | Topics: Pipes & Redirection, Bash / Shell Scripting, Linux Fundamentals | Domain: Linux
Pipes & Redirection - Primer¶
Why This Matters¶
Every Unix command is a small program that reads input, processes it, and writes output. Pipes and redirection are the connective tissue that lets you compose these small programs into powerful data processing pipelines without writing code. They are not a convenience feature — they are the fundamental design pattern of Unix.
Who made it: The pipe concept was invented by Doug McIlroy at Bell Labs in 1973. Ken Thompson implemented it in Unix overnight after McIlroy's suggestion. McIlroy's famous philosophy: "Write programs that do one thing and do it well. Write programs to work together." The
|character was chosen because it was rarely used in existing code. If you cannot fluently redirect output, split streams, and build pipelines, you cannot operate effectively on any Linux system.
File Descriptors: The Foundation¶
Every process has three standard file descriptors open at birth:
| FD | Name | Default | Purpose |
|---|---|---|---|
| 0 | stdin | Keyboard / terminal | Input to the program |
| 1 | stdout | Terminal screen | Normal output |
| 2 | stderr | Terminal screen | Error messages, diagnostics |
These are just numbers that the kernel uses to track open files. Everything in Unix is a file — terminals, pipes, sockets, actual files — and file descriptors are handles to those files.
# See the file descriptors of a running process
ls -la /proc/$$/fd
# 0 -> /dev/pts/0 (stdin — your terminal)
# 1 -> /dev/pts/0 (stdout — your terminal)
# 2 -> /dev/pts/0 (stderr — your terminal)
Output Redirection¶
Redirect stdout to a file¶
# Write stdout to a file (creates or truncates)
echo "hello" > output.txt
# Append stdout to a file
echo "world" >> output.txt
# Command output to file
ls -la /etc > etc_listing.txt
The > operator truncates the file first. If the file does not exist, it is created. If it exists, its contents are destroyed before the command runs.
Redirect stderr to a file¶
# Redirect stderr (fd 2) to a file
find / -name "*.conf" 2> errors.txt
# Append stderr
find / -name "*.conf" 2>> errors.txt
# Discard stderr entirely
find / -name "*.conf" 2>/dev/null
Redirect both stdout and stderr¶
# Redirect both to the same file (bash)
command &> output.txt # bash shorthand
command > output.txt 2>&1 # POSIX-compatible
# Redirect to separate files
command > stdout.txt 2> stderr.txt
# Append both to the same file
command >> output.txt 2>&1
command &>> output.txt # bash shorthand
The 2>&1 syntax means "redirect fd 2 to wherever fd 1 currently points." This is why order matters — it must come after > if you want both in the same file.
Gotcha:
command 2>&1 > file.txtandcommand > file.txt 2>&1are NOT the same. In the first, stderr goes to the terminal (where stdout was pointing when2>&1was evaluated), then stdout goes to the file. In the second, stdout goes to the file first, then stderr follows it there. Redirections are evaluated left to right.
Input Redirection¶
# Read input from a file instead of keyboard
sort < unsorted.txt
# Combine input and output redirection
sort < unsorted.txt > sorted.txt
# Mail with body from file
mail -s "Report" admin@example.com < report.txt
Pipes¶
A pipe connects the stdout of one command to the stdin of the next. The kernel creates an in-memory buffer between them.
# Basic pipe: list files, filter for .log
ls -la /var/log | grep '.log'
# Multi-stage pipeline: find, filter, sort, count
cat access.log | grep "GET /api" | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
# Pipes are concurrent — all stages run in parallel
# Data flows through the pipeline as it is produced
How Pipes Work Internally¶
When you write cmd1 | cmd2:
- The shell creates an anonymous pipe (a kernel buffer, typically 64KB on Linux)
- It forks
cmd1with stdout connected to the write end of the pipe - It forks
cmd2with stdin connected to the read end of the pipe - Both commands run concurrently
- When
cmd1writes faster thancmd2reads, it blocks until the buffer drains - When
cmd2reads faster thancmd1writes, it blocks until data is available
This is producer-consumer concurrency, built into the kernel.
/dev/null and Special Files¶
# /dev/null — the bit bucket (discards everything written to it)
command > /dev/null 2>&1 # silence all output
grep "pattern" file 2>/dev/null # silence errors only
# /dev/zero — infinite stream of zero bytes
dd if=/dev/zero of=testfile bs=1M count=100 # create a 100MB test file
# /dev/urandom — infinite stream of pseudorandom bytes
dd if=/dev/urandom bs=32 count=1 2>/dev/null | base64 # generate a random token
head -c 16 /dev/urandom | xxd -p # 16 random hex bytes
# /dev/stdin, /dev/stdout, /dev/stderr — process self-references
echo "to stderr" > /dev/stderr
cat /dev/stdin # read from own stdin
Here Documents and Here Strings¶
Here Documents (<<)¶
A here document feeds a block of text as stdin to a command:
# Basic here document
cat <<EOF
Hello, ${USER}.
Today is $(date).
Your home is ${HOME}.
EOF
# Quoted delimiter prevents variable expansion
cat <<'EOF'
This $VARIABLE is not expanded.
Neither is $(this command).
EOF
# Indented here document (<<- strips leading tabs only)
if true; then
cat <<-EOF
This line can be indented with tabs.
The tabs are stripped from the output.
EOF
fi
Here documents are essential for:
- Generating config files in scripts
- Feeding multi-line input to commands like mysql, psql, ssh
- Inline test data in scripts
# Feed SQL to postgres
psql -U admin mydb <<EOF
SELECT count(*) FROM users WHERE created_at > now() - interval '1 day';
EOF
# Run commands on a remote host
ssh webserver01 <<'EOF'
systemctl status nginx
df -h /var/log
tail -5 /var/log/nginx/error.log
EOF
Here Strings (<<<)¶
A here string feeds a single string as stdin. Bash-only (not POSIX sh).
# Feed a string to a command
grep "error" <<< "this is an error message"
# Use with read to split a string
IFS=: read -r user _ uid gid _ home shell <<< "root:x:0:0:root:/root:/bin/bash"
echo "User: ${user}, UID: ${uid}, Home: ${home}"
# Avoid echo | command pattern
# Instead of: echo "data" | base64
# Use: base64 <<< "data"
Process Substitution¶
Process substitution (<() and >()) creates a temporary named pipe that looks like a filename to the command receiving it. This lets you use command output where a filename is expected.
# Compare output of two commands (diff requires filenames)
diff <(ls /etc/nginx/sites-available/) <(ls /etc/nginx/sites-enabled/)
# Compare sorted output of two database queries
diff <(psql -c "SELECT id FROM users ORDER BY id" db1) \
<(psql -c "SELECT id FROM users ORDER BY id" db2)
# Feed multiple process outputs to a command
paste <(cut -d: -f1 /etc/passwd) <(cut -d: -f7 /etc/passwd)
# Use output substitution to write to multiple destinations
command | tee >(grep "ERROR" > errors.log) >(grep "WARN" > warnings.log) > /dev/null
Process substitution is a bash/zsh feature, not available in POSIX sh or dash.
Under the hood: Process substitution creates a
/dev/fd/Nfile descriptor (or a named pipe on systems without/dev/fd). When you writediff <(cmd1) <(cmd2), bash creates two pipes, runscmd1andcmd2with their stdout connected to those pipes, and passes the paths/dev/fd/63and/dev/fd/62todiffas if they were filenames.
Named Pipes (FIFOs)¶
A named pipe is a persistent pipe in the filesystem. One process writes, another reads, and data flows between them.
# Create a named pipe
mkfifo /tmp/mypipe
# Terminal 1: read from pipe (blocks until data arrives)
cat /tmp/mypipe
# Terminal 2: write to pipe (blocks until reader is connected)
echo "data from another process" > /tmp/mypipe
# Clean up
rm /tmp/mypipe
Named pipes are useful for: - Inter-process communication between unrelated processes - Feeding data between long-running daemons - Creating processing pipelines that survive across commands
# Example: persistent log filter
mkfifo /tmp/error_pipe
# Background: filter errors to a file
grep --line-buffered "ERROR" < /tmp/error_pipe > /var/log/errors_only.log &
# Foreground: application writes to the pipe
myapp > /tmp/error_pipe 2>&1
tee — Splitting Output¶
tee reads from stdin and writes to both stdout and one or more files simultaneously.
# Write to screen and file
make build 2>&1 | tee build.log
# Append instead of overwrite
command | tee -a logfile.txt
# Write to multiple files
command | tee file1.txt file2.txt file3.txt
# Use in pipelines: capture intermediate output
cat data.csv | tee raw_data.log | sort | tee sorted_data.log | uniq -c > final.txt
xargs — Stdin to Arguments¶
Many commands do not read stdin — they take arguments. xargs bridges the gap by converting stdin lines into command arguments.
# Delete all .tmp files found by find
find /tmp -name "*.tmp" -print0 | xargs -0 rm -f
# Run a command on each line of input
cat hosts.txt | xargs -I {} ssh {} "uptime"
# Parallel execution
cat urls.txt | xargs -P 4 -I {} curl -s -o /dev/null -w "%{url_effective}: %{http_code}\n" {}
# Batch arguments (pass 50 files at a time to grep)
find . -name '*.py' | xargs -n 50 grep "import"
| Flag | Purpose |
|---|---|
-0 |
Use null delimiter (pair with find -print0) |
-I {} |
Replace {} with each input line |
-n N |
Pass N arguments at a time |
-P N |
Run N parallel processes |
-L 1 |
Run command once per input line |
Command Substitution¶
Command substitution captures the stdout of a command and inserts it as text.
# Modern syntax: $()
current_date=$(date +%Y-%m-%d)
file_count=$(find . -name '*.py' | wc -l)
git_sha=$(git rev-parse --short HEAD)
# Legacy syntax: backticks (avoid — hard to nest, hard to read)
current_date=`date +%Y-%m-%d`
# Nesting works naturally with $()
echo "Kernel: $(uname -r) on $(hostname) ($(uname -m))"
# Nesting is painful with backticks (must escape inner backticks)
echo "Kernel: `uname -r` on `hostname`"
Always use $() syntax. Backticks are a legacy holdover that make code harder to read and impossible to nest cleanly.
Subshells and Pipes¶
Each segment of a pipeline runs in a subshell. This has critical implications for variable scope.
# Variables set in a pipeline segment are LOST after the pipeline
count=0
cat data.txt | while read -r line; do
count=$(( count + 1 ))
done
echo "${count}" # Prints 0 — the while loop ran in a subshell
# Fix: use process substitution to avoid the subshell
count=0
while read -r line; do
count=$(( count + 1 ))
done < <(cat data.txt)
echo "${count}" # Prints the correct count
# Fix: use lastpipe (bash 4.2+)
shopt -s lastpipe
count=0
cat data.txt | while read -r line; do
count=$(( count + 1 ))
done
echo "${count}" # Now correct — last pipe segment runs in current shell
PIPESTATUS and pipefail¶
PIPESTATUS¶
PIPESTATUS is a bash array containing the exit code of each command in the most recent pipeline.
false | true | false
echo "${PIPESTATUS[@]}" # 1 0 1
echo "${PIPESTATUS[0]}" # 1 (first command)
echo "${PIPESTATUS[1]}" # 0 (second command)
echo "${PIPESTATUS[2]}" # 1 (third command)
Without PIPESTATUS, $? only gives you the exit code of the last command in the pipeline. A failing first stage goes unnoticed.
pipefail¶
# Without pipefail: pipeline succeeds if the LAST command succeeds
false | true
echo $? # 0 — failure is hidden
# With pipefail: pipeline fails if ANY command fails
set -o pipefail
false | true
echo $? # 1 — failure is caught
Remember: The safe script header mnemonic:
set -euo pipefail— Exit on error, Undefined variables are errors, O pipefail catches failures in pipelines. Memorize this as "EUO" and put it at the top of every bash script.
Every production script should use set -o pipefail. Without it, broken pipelines silently produce partial or corrupt output.
File Descriptor Manipulation¶
For advanced redirection, you can open, close, and duplicate file descriptors using exec:
# Open fd 3 for writing to a log file
exec 3> /var/log/myapp/debug.log
# Write to fd 3 throughout the script
echo "Starting process" >&3
do_work
echo "Work complete, exit code: $?" >&3
# Close fd 3
exec 3>&-
# Open fd 4 for reading
exec 4< /etc/hosts
while read -r line <&4; do
echo "Host: ${line}"
done
exec 4<&-
# Swap stdout and stderr (advanced)
command 3>&1 1>&2 2>&3 3>&-
File descriptor manipulation is primarily useful for: - Logging to multiple destinations - Separating different output streams - Implementing progress indicators (data on stdout, progress on fd 3)
Putting It All Together¶
A real-world data processing pipeline:
#!/usr/bin/env bash
set -euo pipefail
# Process web server access logs:
# 1. Decompress rotated logs
# 2. Filter for API requests
# 3. Extract response times
# 4. Calculate statistics
{
# Current log
cat /var/log/nginx/access.log
# Rotated compressed logs
zcat /var/log/nginx/access.log.*.gz
} | grep -E 'GET /api/' \
| awk '{print $NF}' \
| sort -n \
| tee >(wc -l > /tmp/total_requests.txt) \
| awk '{
sum += $1; count++; values[count] = $1
} END {
print "Total requests:", count
print "Average:", sum/count, "ms"
print "Median:", values[int(count/2)]
print "P99:", values[int(count*0.99)]
print "Max:", values[count]
}'
This pipeline decompresses, filters, extracts, sorts, counts, and computes statistics — all without writing a single temporary file, using constant memory regardless of log size.
Wiki Navigation¶
Related Content¶
- Advanced Bash for Ops (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Bash / Shell Scripting, Linux Fundamentals
- Environment Variables (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- LPIC / LFCS Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting, Linux Fundamentals
- Linux Ops (Topic Pack, L0) — Bash / Shell Scripting, Linux Fundamentals
- Linux Ops Drills (Drill, L0) — Bash / Shell Scripting, Linux Fundamentals
- Process Management (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- RHCE (EX294) Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting, Linux Fundamentals
- Regex & Text Wrangling (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- Track: Foundations (Reference, L0) — Bash / Shell Scripting, Linux Fundamentals