What Happens Inside a Linux Pipe
- lesson
- pipes
- file-descriptors
- kernel-buffers
- process-scheduling
- backpressure ---# What Happens Inside a Linux Pipe
Topics: pipes, file descriptors, kernel buffers, process scheduling, backpressure
Level: L1–L2 (Foundations → Operations)
Time: 45–60 minutes
Prerequisites: Basic shell usage (you've used | before)
The Mission¶
You type this and get the top 10 IP addresses causing 500 errors. It takes 2 seconds on a 50 million line log file. Six commands, running simultaneously, each doing one job, data flowing between them at memory speed.
How does this actually work? How does grep know to wait for cat? How does sort get
all its input before sorting? What happens when head -10 gets its 10 lines and stops —
does the whole pipeline keep running?
What a Pipe Is¶
A pipe is a kernel buffer — a 64KB chunk of memory (on modern Linux) that connects the stdout of one process to the stdin of the next.
catwrites to its stdout (fd 1), which is connected to the write end of the pipegrepreads from its stdin (fd 0), which is connected to the read end of the pipe- The kernel manages the buffer between them
# See the pipe buffer size
cat /proc/sys/fs/pipe-max-size
# → 1048576 (1MB max, default starts at 64KB)
# The actual default
python3 -c "import fcntl; import os; r,w = os.pipe(); print(fcntl.fcntl(w, 1032))"
# → 65536 (64KB)
Name Origin: Pipes were conceived by Doug McIlroy in 1964 and implemented by Ken Thompson in a single night in 1973 for Unix V3. The
|character was chosen because it visually suggests a conduit — data flows through it. McIlroy's original memo: "We should have some way of coupling programs like a garden hose — screw in another segment when it becomes necessary to massage data in another way."
Backpressure: How Fast Processes Wait for Slow Ones¶
What happens when cat writes faster than grep reads?
catwrites to the pipe buffer- Buffer fills up (64KB)
cattries to write more → kernel blocks cat (puts it to sleep)grepreads from the buffer → frees space- Kernel wakes
cat→catwrites more
This is backpressure — a fast producer is automatically slowed down to match a slow consumer. No configuration needed. The kernel handles it.
cat (fast) ─→ [buffer FULL] ─→ grep (slow)
↑ sleeping ↑ reading
cat (fast) ─→ [buffer has space] ─→ grep (slow)
↑ writing again ↑ just read some
Under the Hood: When a process writes to a full pipe, the kernel suspends it with
TASK_INTERRUPTIBLEstate. When the reader drains some data, the kernel wakes the writer viawake_up_interruptible(). This happens thousands of times per second in a busy pipeline. Each context switch takes ~1-2 microseconds on modern hardware.
The Parallel Execution Surprise¶
All commands in a pipeline run simultaneously, not sequentially:
This starts 4 processes at the same time. They run in parallel:
catreads the file and writes to pipe 1grepreads from pipe 1, filters, writes to pipe 2sortreads from pipe 2, accumulates all input, then sortsuniqreads from pipe 3 (sort's output) and counts
Most are streaming: they process data as it arrives. sort is the exception — it must
read ALL input before producing any output (you can't sort a partial list). This is why
sort is often the bottleneck in pipelines.
# See all processes in a pipeline
echo "test" | cat | cat | cat &
ps aux | grep cat
# → You'll see 3 cat processes running simultaneously
SIGPIPE: When the Reader Stops¶
yes produces infinite output. head -5 reads 5 lines and exits. What happens to yes?
headreads 5 lines, closes its stdin (the read end of the pipe)yestries to write to the pipe- The pipe has no reader → kernel sends SIGPIPE to
yes yesdies immediately (default SIGPIPE handler = terminate)
This is how head -10 works efficiently on huge pipelines. The moment it has its 10 lines,
it exits. SIGPIPE kills everything upstream. No wasted processing.
# Prove it — time a pipeline with and without head
time cat /dev/urandom | base64 | head -1000 > /dev/null
# → ~0.01s (head stops the pipeline after 1000 lines)
time cat /dev/urandom | base64 | wc -l
# → (never finishes — /dev/urandom is infinite)
Gotcha: If a program catches or ignores SIGPIPE (some do for robustness), it will get EPIPE on the next write instead. If it ignores THAT too, it keeps running and writing to a broken pipe, wasting CPU. This is why some pipeline commands seem to hang after
headexits.
Named Pipes (FIFOs): Pipes Without a Pipeline¶
Normal pipes exist only between processes connected by |. Named pipes are files on the
filesystem that act as pipes between unrelated processes:
# Create a named pipe
mkfifo /tmp/mypipe
# Terminal 1: write to it (blocks until someone reads)
echo "hello from terminal 1" > /tmp/mypipe
# Terminal 2: read from it
cat /tmp/mypipe
# → hello from terminal 1
# Clean up
rm /tmp/mypipe
Trivia: Named pipes were added to Unix in System III (1982). They're used by some programs for inter-process communication (MySQL can listen on a Unix socket, which is similar). The
mkfifocommand creates them; they look like regular files inls -labut with aptype flag:prw-r--r-- 1 user user 0 ... /tmp/mypipe
Flashcard Check¶
Q1: What is the default pipe buffer size on Linux?
64KB. When full, the writer is put to sleep. When the reader drains some data, the writer is woken up. This is automatic backpressure.
Q2: yes | head -5 — yes produces infinite output. Why does it stop?
headreads 5 lines, closes its end of the pipe, and exits. The kernel sends SIGPIPE toyes, which terminates. No wasted output.
Q3: Do pipeline commands run sequentially or in parallel?
Parallel. All processes start simultaneously and run concurrently. Data flows between them via kernel buffers.
sortis a bottleneck because it must read all input first.
Q4: Who invented Unix pipes?
Doug McIlroy conceived the idea in 1964. Ken Thompson implemented them in a single night in 1973 for Unix V3.
Cheat Sheet¶
Pipe Internals¶
| Concept | Detail |
|---|---|
| Buffer size | 64KB default (configurable up to 1MB) |
| Backpressure | Writer sleeps when buffer full |
| SIGPIPE | Sent when writing to pipe with no reader |
| Parallelism | All pipeline stages run simultaneously |
| Blocking | sort must read all input before outputting |
Useful Pipe Patterns¶
# Process substitution (feed two commands from one source)
diff <(sort file1) <(sort file2)
# Tee (send output to file AND next command)
cat access.log | tee /tmp/copy.log | grep 500
# Named pipe (connect unrelated processes)
mkfifo /tmp/pipe; cmd1 > /tmp/pipe & cmd2 < /tmp/pipe
Takeaways¶
-
Pipes are 64KB kernel buffers. Backpressure is automatic — fast writers wait for slow readers. No configuration needed.
-
Pipeline commands run in parallel. They're concurrent processes, not sequential steps. This is why pipelines are fast.
-
SIGPIPE makes pipelines efficient.
head -10exits and kills the entire upstream. No wasted processing on lines 11 through infinity. -
sortis the pipeline bottleneck. It must consume all input before producing output. Every other common tool streams. -
Pipes were invented in 1973 and haven't changed. The design is elegant: one-way data flow, automatic backpressure, SIGPIPE cleanup. Fifty years, zero modifications.
Related Lessons¶
- The Hanging Deploy — processes and signals (SIGPIPE is a signal)
- strace: Reading the Matrix — tracing pipe reads and writes
- What Happens When You Type a Regex — grep inside pipes