Portal | Level: L1: Foundations | Topics: Linux Signals & Process Control, Linux Fundamentals | Domain: Linux

Linux Signals & Process Control - Primer¶

Why This Matters¶

Every time you deploy a service, restart a daemon, or press Ctrl+C, you are sending signals. Signals are the kernel's mechanism for delivering asynchronous notifications to processes. They govern graceful shutdown, configuration reload, job control, and crash handling. If you do not understand signals, you cannot write proper shutdown handlers, you cannot debug stuck processes, and you cannot explain why kill -9 leaves your system in a broken state.

Timeline: Signals have been part of Unix since the very first version (1971). The original Unix V1 had only a handful of signals. The modern signal set was standardized by POSIX.1 (IEEE 1003.1, 1988) and has remained stable since. The signal numbers vary slightly between architectures (x86 vs MIPS vs ARM), but the names and semantics are consistent across all POSIX systems. The kill -l command shows the signal numbers for your specific platform.

What Signals Are¶

A signal is an asynchronous notification sent to a process by the kernel, by another process, or by the process itself. When a signal arrives, the process can:

Handle it — run a registered signal handler function
Ignore it — explicitly choose to do nothing
Take the default action — whatever the kernel does if no handler is registered (usually terminate or ignore)

Two signals cannot be caught or ignored: SIGKILL (9) and SIGSTOP (19). The kernel enforces these directly. Everything else is negotiable.

# List all signals on your system
kill -l

The Signals You Must Know¶

Signal	Number	Default	Catchable?	Purpose
SIGTERM	15	Terminate	Yes	Polite shutdown request
SIGKILL	9	Terminate	No	Unconditional kill
SIGINT	2	Terminate	Yes	Ctrl+C — interrupt
SIGHUP	1	Terminate	Yes	Config reload / terminal hangup
SIGUSR1	10	Terminate	Yes	Application-defined
SIGUSR2	12	Terminate	Yes	Application-defined
SIGSTOP	19	Stop	No	Unconditional pause
SIGCONT	18	Continue	Yes	Resume stopped process
SIGCHLD	17	Ignore	Yes	Child state change (enables reaping)
SIGPIPE	13	Terminate	Yes	Write to broken pipe/socket
SIGSEGV	11	Core dump	Yes	Segmentation fault

SIGTERM vs SIGKILL¶

SIGTERM says "please shut down." The process can catch it, flush buffers, close connections, remove lock files, and exit cleanly. Docker, Kubernetes, and systemd all send SIGTERM first.

SIGKILL says "you are dead now." The kernel removes the process immediately. No handler runs. No cleanup. Buffers, connections, lock files, temp files, shared memory — all left behind.

Remember: The signal escalation sequence in production: TERM, wait, KILL. Always send SIGTERM first and give the process time to clean up (Kubernetes default: 30 seconds via terminationGracePeriodSeconds). Only send SIGKILL as a last resort. The mnemonic: "Ask politely (15), then execute (9)." Exit code = 128 + signal number: SIGTERM (15) = 143, SIGKILL (9) = 137.

kill 1234            # Sends SIGTERM (default)
kill -9 1234         # SIGKILL — last resort only

SIGHUP — Reload Without Restart¶

Modern daemons repurpose SIGHUP as "reload your configuration." This avoids dropping active connections during a restart.

kill -HUP $(cat /var/run/nginx.pid)   # nginx reloads config
kill -HUP $(pidof haproxy)            # HAProxy reloads
systemctl reload nginx                 # Same thing via systemd

When you close a terminal, the shell sends SIGHUP to all children. This is why background processes die on logout.

SIGUSR1/2, SIGSTOP/CONT, SIGCHLD, SIGPIPE¶

SIGUSR1/2 have no predefined meaning. Applications define their own behavior (toggle debug logging, dump state, reopen log files). dd reports progress on SIGUSR1.

SIGSTOP freezes a process (uncatchable). SIGCONT resumes it. Useful during incidents: freeze a misbehaving process to stop damage without losing its state.

SIGCHLD notifies a parent when a child exits. Parents that handle it (or call wait()) can clean up zombie children.

SIGPIPE fires when writing to a pipe whose reader has closed. Many services must handle it to avoid dying when clients disconnect.

The kill Command Family¶

# kill — by PID
kill PID              # SIGTERM
kill -9 PID           # SIGKILL
kill -HUP PID         # SIGHUP
kill -0 PID           # Test if alive (no signal sent)
kill -TERM -PGID      # Signal entire process group (negative PID)

# pgrep — find by pattern
pgrep nginx               # PIDs matching name
pgrep -a nginx            # PIDs + full command
pgrep -u www-data         # By user
pgrep -f "python app.py"  # Match full command line
pgrep -P 1234             # Children of PID 1234

# pkill — signal by pattern (safer than ps|grep|kill)
pkill -TERM nginx
pkill -HUP -f "gunicorn"
pkill -9 -u baduser
pkill -TERM -P 1234       # Kill children of PID 1234

Process States¶

State	Code	Meaning	Impact
Running	R	On CPU or in run queue	Normal
Sleeping (interruptible)	S	Waiting for event, can be signaled	Normal — most processes live here
Sleeping (uninterruptible)	D	Waiting for I/O, cannot be signaled	Cannot be killed — fix the I/O subsystem
Stopped	T	Paused by SIGSTOP/SIGTSTP	Job control, debugging
Zombie	Z	Exited, parent has not called wait()	Holds a PID slot, cannot be killed

Reading ps STAT Column¶

ps aux
# STAT column: Ss = sleeping + session leader, R+ = running + foreground
# Modifiers: s=session leader, l=multi-threaded, +=foreground, <=high priority, N=low priority

/proc/PID/status¶

cat /proc/1234/status | grep -E 'Name|State|PPid|Threads|VmRSS|VmSwap|ctxt'
# VmRSS = physical memory used
# High nonvoluntary_ctxt_switches = CPU-bound (being preempted)
# High voluntary_ctxt_switches = I/O-bound (frequently waiting)

nice and renice — Process Priority¶

Nice values range from -20 (highest priority) to +19 (lowest). Default is 0. Only root can set negative values.

nice -n 10 ./backup.sh            # Start at lower priority
nice -n -5 ./critical-task.sh     # Higher priority (root only)
renice 15 -p 1234                 # Change running process
renice 10 -u backupuser           # All processes by user
ps -o pid,ni,comm -p 1234         # Check current nice value

Process Groups and Sessions¶

Session (SID)
+-- Process Group 1 (foreground job)
|   +-- Process A (group leader)
|   +-- Process B
+-- Process Group 2 (background job)
|   +-- Process C (group leader)
+-- Session Leader (the shell)

ps -o pid,pgid,sid,comm          # View group and session IDs
kill -TERM -$PGID                # Signal entire group (negative PID)

Closing a terminal sends SIGHUP to the session leader, which propagates to all process groups.

Job Control¶

./task.sh &          # Run in background
jobs -l              # List jobs
# Ctrl+Z             # Suspend foreground (sends SIGTSTP)
bg %1                # Resume in background
fg %1                # Bring to foreground
kill %1              # Kill by job number
disown %1            # Detach from shell (survives logout)

Surviving Terminal Disconnect¶

# nohup — ignores SIGHUP, redirects stdout
nohup ./task.sh > /var/log/task.log 2>&1 &

# disown — detach already-running job
./task.sh > /tmp/task.log 2>&1 &
disown %1

# systemd-run — modern, managed, logged
systemd-run --unit=my-task --remain-after-exit /opt/scripts/task.sh
journalctl -u my-task -f

Gotcha: In non-interactive shells (scripts, cron, CI pipelines), job control is disabled by default. Commands like fg, bg, and jobs do not work. Background processes started with & in a script will be killed when the script exits unless you use nohup, disown, or setsid. This catches people who test scripts interactively and then wonder why background processes die in cron.

Orphan and Zombie Processes¶

Orphans are running processes whose parent has died. The kernel reparents them to PID 1. Not inherently bad, but they indicate a supervision gap.

Zombies are dead processes whose parent has not called wait(). They consume no resources except a PID table slot. You cannot kill them (they are already dead). Fix the parent.

# Find zombies and their parents
ps -eo pid,ppid,stat,comm | awk '$3 ~ /^Z/ {print "Zombie:", $1, "Parent:", $2}'

# Kill the parent to let PID 1 reap the zombies
kill -TERM <parent_pid>

In containers, PID 1 must reap children. Use tini or dumb-init:

ENTRYPOINT ["tini", "--"]
CMD ["python", "app.py"]

trap in Bash Scripts¶

#!/bin/bash
set -euo pipefail

cleanup() {
    echo "Shutting down..."
    kill $(jobs -p) 2>/dev/null
    wait $(jobs -p) 2>/dev/null
    rm -f /var/run/my-service.pid
    exit 0
}

trap cleanup SIGTERM SIGINT SIGHUP
trap '' SIGPIPE    # Ignore broken pipes

echo $$ > /var/run/my-service.pid

# Run app in background so trap can fire between waits
/usr/local/bin/my-app &
wait $!
cleanup

The key insight: trap handlers only fire when the shell is not executing a foreground command. Running the app with & and using wait lets the trap fire promptly on signal delivery.

Key Takeaways¶

Signals are async notifications. SIGTERM asks politely, SIGKILL forces. Always SIGTERM first.
SIGKILL and SIGSTOP cannot be caught. Everything else can be handled or ignored.
SIGHUP means "reload config" for daemons and "terminal gone" for everything else.
Process states: S is normal, D is stuck on I/O (unkillable), Z is a dead child awaiting reap.
nice/renice control scheduling priority. Use them to protect production from batch work.
Process groups let you signal entire pipelines. Sessions tie to terminal sessions.
nohup/disown keep processes alive after logout. systemd-run is the modern alternative.
Zombies cannot be killed. Fix or kill the parent and PID 1 reaps them.
trap in bash implements graceful shutdown. Run children in background and use wait.

Prerequisites¶

Linux Ops (Topic Pack, L0)

/proc Filesystem (Topic Pack, L2) — Linux Fundamentals
Advanced Bash for Ops (Topic Pack, L1) — Linux Fundamentals
Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Linux Fundamentals
Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Linux Fundamentals
Case Study: CI Pipeline Fails — Docker Layer Cache Corruption (Case Study, L2) — Linux Fundamentals
Case Study: Container Vuln Scanner False Positive Blocks Deploy (Case Study, L2) — Linux Fundamentals
Case Study: Disk Full Root Services Down (Case Study, L1) — Linux Fundamentals
Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention (Case Study, L2) — Linux Fundamentals
Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP (Case Study, L2) — Linux Fundamentals
Case Study: Inode Exhaustion (Case Study, L1) — Linux Fundamentals

Linux Signals & Process Control - Primer¶

Why This Matters¶

What Signals Are¶

The Signals You Must Know¶

SIGTERM vs SIGKILL¶

SIGHUP — Reload Without Restart¶

SIGUSR1/2, SIGSTOP/CONT, SIGCHLD, SIGPIPE¶

The kill Command Family¶

Process States¶

Reading ps STAT Column¶

/proc/PID/status¶

nice and renice — Process Priority¶

Process Groups and Sessions¶

Job Control¶

Surviving Terminal Disconnect¶

Orphan and Zombie Processes¶

trap in Bash Scripts¶

Key Takeaways¶

Wiki Navigation¶

Prerequisites¶

Pages that link here¶

Linux Signals & Process Control - Primer¶

Why This Matters¶

What Signals Are¶

The Signals You Must Know¶

SIGTERM vs SIGKILL¶

SIGHUP — Reload Without Restart¶

SIGUSR1/2, SIGSTOP/CONT, SIGCHLD, SIGPIPE¶

The kill Command Family¶

Process States¶

Reading ps STAT Column¶

/proc/PID/status¶

nice and renice — Process Priority¶

Process Groups and Sessions¶

Job Control¶

Surviving Terminal Disconnect¶

Orphan and Zombie Processes¶

trap in Bash Scripts¶

Key Takeaways¶

Wiki Navigation¶

Prerequisites¶

Related Content¶

Pages that link here¶