Portal | Level: L1: Foundations | Topics: Linux Signals & Process Control, Linux Fundamentals | Domain: Linux
Linux Signals & Process Control - Primer¶
Why This Matters¶
Every time you deploy a service, restart a daemon, or press Ctrl+C, you are sending signals. Signals are the kernel's mechanism for delivering asynchronous notifications to processes. They govern graceful shutdown, configuration reload, job control, and crash handling. If you do not understand signals, you cannot write proper shutdown handlers, you cannot debug stuck processes, and you cannot explain why kill -9 leaves your system in a broken state.
Timeline: Signals have been part of Unix since the very first version (1971). The original Unix V1 had only a handful of signals. The modern signal set was standardized by POSIX.1 (IEEE 1003.1, 1988) and has remained stable since. The signal numbers vary slightly between architectures (x86 vs MIPS vs ARM), but the names and semantics are consistent across all POSIX systems. The
kill -lcommand shows the signal numbers for your specific platform.
What Signals Are¶
A signal is an asynchronous notification sent to a process by the kernel, by another process, or by the process itself. When a signal arrives, the process can:
- Handle it — run a registered signal handler function
- Ignore it — explicitly choose to do nothing
- Take the default action — whatever the kernel does if no handler is registered (usually terminate or ignore)
Two signals cannot be caught or ignored: SIGKILL (9) and SIGSTOP (19). The kernel enforces these directly. Everything else is negotiable.
The Signals You Must Know¶
| Signal | Number | Default | Catchable? | Purpose |
|---|---|---|---|---|
| SIGTERM | 15 | Terminate | Yes | Polite shutdown request |
| SIGKILL | 9 | Terminate | No | Unconditional kill |
| SIGINT | 2 | Terminate | Yes | Ctrl+C — interrupt |
| SIGHUP | 1 | Terminate | Yes | Config reload / terminal hangup |
| SIGUSR1 | 10 | Terminate | Yes | Application-defined |
| SIGUSR2 | 12 | Terminate | Yes | Application-defined |
| SIGSTOP | 19 | Stop | No | Unconditional pause |
| SIGCONT | 18 | Continue | Yes | Resume stopped process |
| SIGCHLD | 17 | Ignore | Yes | Child state change (enables reaping) |
| SIGPIPE | 13 | Terminate | Yes | Write to broken pipe/socket |
| SIGSEGV | 11 | Core dump | Yes | Segmentation fault |
SIGTERM vs SIGKILL¶
SIGTERM says "please shut down." The process can catch it, flush buffers, close connections, remove lock files, and exit cleanly. Docker, Kubernetes, and systemd all send SIGTERM first.
SIGKILL says "you are dead now." The kernel removes the process immediately. No handler runs. No cleanup. Buffers, connections, lock files, temp files, shared memory — all left behind.
Remember: The signal escalation sequence in production: TERM, wait, KILL. Always send SIGTERM first and give the process time to clean up (Kubernetes default: 30 seconds via
terminationGracePeriodSeconds). Only send SIGKILL as a last resort. The mnemonic: "Ask politely (15), then execute (9)." Exit code = 128 + signal number: SIGTERM (15) = 143, SIGKILL (9) = 137.
SIGHUP — Reload Without Restart¶
Modern daemons repurpose SIGHUP as "reload your configuration." This avoids dropping active connections during a restart.
kill -HUP $(cat /var/run/nginx.pid) # nginx reloads config
kill -HUP $(pidof haproxy) # HAProxy reloads
systemctl reload nginx # Same thing via systemd
When you close a terminal, the shell sends SIGHUP to all children. This is why background processes die on logout.
SIGUSR1/2, SIGSTOP/CONT, SIGCHLD, SIGPIPE¶
SIGUSR1/2 have no predefined meaning. Applications define their own behavior (toggle debug logging, dump state, reopen log files). dd reports progress on SIGUSR1.
SIGSTOP freezes a process (uncatchable). SIGCONT resumes it. Useful during incidents: freeze a misbehaving process to stop damage without losing its state.
SIGCHLD notifies a parent when a child exits. Parents that handle it (or call wait()) can clean up zombie children.
SIGPIPE fires when writing to a pipe whose reader has closed. Many services must handle it to avoid dying when clients disconnect.
The kill Command Family¶
# kill — by PID
kill PID # SIGTERM
kill -9 PID # SIGKILL
kill -HUP PID # SIGHUP
kill -0 PID # Test if alive (no signal sent)
kill -TERM -PGID # Signal entire process group (negative PID)
# pgrep — find by pattern
pgrep nginx # PIDs matching name
pgrep -a nginx # PIDs + full command
pgrep -u www-data # By user
pgrep -f "python app.py" # Match full command line
pgrep -P 1234 # Children of PID 1234
# pkill — signal by pattern (safer than ps|grep|kill)
pkill -TERM nginx
pkill -HUP -f "gunicorn"
pkill -9 -u baduser
pkill -TERM -P 1234 # Kill children of PID 1234
Process States¶
| State | Code | Meaning | Impact |
|---|---|---|---|
| Running | R | On CPU or in run queue | Normal |
| Sleeping (interruptible) | S | Waiting for event, can be signaled | Normal — most processes live here |
| Sleeping (uninterruptible) | D | Waiting for I/O, cannot be signaled | Cannot be killed — fix the I/O subsystem |
| Stopped | T | Paused by SIGSTOP/SIGTSTP | Job control, debugging |
| Zombie | Z | Exited, parent has not called wait() | Holds a PID slot, cannot be killed |
Reading ps STAT Column¶
ps aux
# STAT column: Ss = sleeping + session leader, R+ = running + foreground
# Modifiers: s=session leader, l=multi-threaded, +=foreground, <=high priority, N=low priority
/proc/PID/status¶
cat /proc/1234/status | grep -E 'Name|State|PPid|Threads|VmRSS|VmSwap|ctxt'
# VmRSS = physical memory used
# High nonvoluntary_ctxt_switches = CPU-bound (being preempted)
# High voluntary_ctxt_switches = I/O-bound (frequently waiting)
nice and renice — Process Priority¶
Nice values range from -20 (highest priority) to +19 (lowest). Default is 0. Only root can set negative values.
nice -n 10 ./backup.sh # Start at lower priority
nice -n -5 ./critical-task.sh # Higher priority (root only)
renice 15 -p 1234 # Change running process
renice 10 -u backupuser # All processes by user
ps -o pid,ni,comm -p 1234 # Check current nice value
Process Groups and Sessions¶
Session (SID)
+-- Process Group 1 (foreground job)
| +-- Process A (group leader)
| +-- Process B
+-- Process Group 2 (background job)
| +-- Process C (group leader)
+-- Session Leader (the shell)
ps -o pid,pgid,sid,comm # View group and session IDs
kill -TERM -$PGID # Signal entire group (negative PID)
Closing a terminal sends SIGHUP to the session leader, which propagates to all process groups.
Job Control¶
./task.sh & # Run in background
jobs -l # List jobs
# Ctrl+Z # Suspend foreground (sends SIGTSTP)
bg %1 # Resume in background
fg %1 # Bring to foreground
kill %1 # Kill by job number
disown %1 # Detach from shell (survives logout)
Surviving Terminal Disconnect¶
# nohup — ignores SIGHUP, redirects stdout
nohup ./task.sh > /var/log/task.log 2>&1 &
# disown — detach already-running job
./task.sh > /tmp/task.log 2>&1 &
disown %1
# systemd-run — modern, managed, logged
systemd-run --unit=my-task --remain-after-exit /opt/scripts/task.sh
journalctl -u my-task -f
Gotcha: In non-interactive shells (scripts, cron, CI pipelines), job control is disabled by default. Commands like
fg,bg, andjobsdo not work. Background processes started with&in a script will be killed when the script exits unless you usenohup,disown, orsetsid. This catches people who test scripts interactively and then wonder why background processes die in cron.
Orphan and Zombie Processes¶
Orphans are running processes whose parent has died. The kernel reparents them to PID 1. Not inherently bad, but they indicate a supervision gap.
Zombies are dead processes whose parent has not called wait(). They consume no resources except a PID table slot. You cannot kill them (they are already dead). Fix the parent.
# Find zombies and their parents
ps -eo pid,ppid,stat,comm | awk '$3 ~ /^Z/ {print "Zombie:", $1, "Parent:", $2}'
# Kill the parent to let PID 1 reap the zombies
kill -TERM <parent_pid>
In containers, PID 1 must reap children. Use tini or dumb-init:
trap in Bash Scripts¶
#!/bin/bash
set -euo pipefail
cleanup() {
echo "Shutting down..."
kill $(jobs -p) 2>/dev/null
wait $(jobs -p) 2>/dev/null
rm -f /var/run/my-service.pid
exit 0
}
trap cleanup SIGTERM SIGINT SIGHUP
trap '' SIGPIPE # Ignore broken pipes
echo $$ > /var/run/my-service.pid
# Run app in background so trap can fire between waits
/usr/local/bin/my-app &
wait $!
cleanup
The key insight: trap handlers only fire when the shell is not executing a foreground command. Running the app with & and using wait lets the trap fire promptly on signal delivery.
Key Takeaways¶
- Signals are async notifications. SIGTERM asks politely, SIGKILL forces. Always SIGTERM first.
- SIGKILL and SIGSTOP cannot be caught. Everything else can be handled or ignored.
- SIGHUP means "reload config" for daemons and "terminal gone" for everything else.
- Process states: S is normal, D is stuck on I/O (unkillable), Z is a dead child awaiting reap.
- nice/renice control scheduling priority. Use them to protect production from batch work.
- Process groups let you signal entire pipelines. Sessions tie to terminal sessions.
- nohup/disown keep processes alive after logout. systemd-run is the modern alternative.
- Zombies cannot be killed. Fix or kill the parent and PID 1 reaps them.
- trap in bash implements graceful shutdown. Run children in background and use wait.
Wiki Navigation¶
Prerequisites¶
- Linux Ops (Topic Pack, L0)
Related Content¶
- /proc Filesystem (Topic Pack, L2) — Linux Fundamentals
- Advanced Bash for Ops (Topic Pack, L1) — Linux Fundamentals
- Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Linux Fundamentals
- Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Linux Fundamentals
- Case Study: CI Pipeline Fails — Docker Layer Cache Corruption (Case Study, L2) — Linux Fundamentals
- Case Study: Container Vuln Scanner False Positive Blocks Deploy (Case Study, L2) — Linux Fundamentals
- Case Study: Disk Full Root Services Down (Case Study, L1) — Linux Fundamentals
- Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention (Case Study, L2) — Linux Fundamentals
- Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP (Case Study, L2) — Linux Fundamentals
- Case Study: Inode Exhaustion (Case Study, L1) — Linux Fundamentals