Skip to content

Portal | Level: L1: Foundations | Topics: Linux Signals & Process Control, Linux Fundamentals | Domain: Linux

Linux Signals & Process Control - Primer

Why This Matters

Every time you deploy a service, restart a daemon, or press Ctrl+C, you are sending signals. Signals are the kernel's mechanism for delivering asynchronous notifications to processes. They govern graceful shutdown, configuration reload, job control, and crash handling. If you do not understand signals, you cannot write proper shutdown handlers, you cannot debug stuck processes, and you cannot explain why kill -9 leaves your system in a broken state.


Timeline: Signals have been part of Unix since the very first version (1971). The original Unix V1 had only a handful of signals. The modern signal set was standardized by POSIX.1 (IEEE 1003.1, 1988) and has remained stable since. The signal numbers vary slightly between architectures (x86 vs MIPS vs ARM), but the names and semantics are consistent across all POSIX systems. The kill -l command shows the signal numbers for your specific platform.

What Signals Are

A signal is an asynchronous notification sent to a process by the kernel, by another process, or by the process itself. When a signal arrives, the process can:

  1. Handle it — run a registered signal handler function
  2. Ignore it — explicitly choose to do nothing
  3. Take the default action — whatever the kernel does if no handler is registered (usually terminate or ignore)

Two signals cannot be caught or ignored: SIGKILL (9) and SIGSTOP (19). The kernel enforces these directly. Everything else is negotiable.

# List all signals on your system
kill -l

The Signals You Must Know

Signal Number Default Catchable? Purpose
SIGTERM 15 Terminate Yes Polite shutdown request
SIGKILL 9 Terminate No Unconditional kill
SIGINT 2 Terminate Yes Ctrl+C — interrupt
SIGHUP 1 Terminate Yes Config reload / terminal hangup
SIGUSR1 10 Terminate Yes Application-defined
SIGUSR2 12 Terminate Yes Application-defined
SIGSTOP 19 Stop No Unconditional pause
SIGCONT 18 Continue Yes Resume stopped process
SIGCHLD 17 Ignore Yes Child state change (enables reaping)
SIGPIPE 13 Terminate Yes Write to broken pipe/socket
SIGSEGV 11 Core dump Yes Segmentation fault

SIGTERM vs SIGKILL

SIGTERM says "please shut down." The process can catch it, flush buffers, close connections, remove lock files, and exit cleanly. Docker, Kubernetes, and systemd all send SIGTERM first.

SIGKILL says "you are dead now." The kernel removes the process immediately. No handler runs. No cleanup. Buffers, connections, lock files, temp files, shared memory — all left behind.

Remember: The signal escalation sequence in production: TERM, wait, KILL. Always send SIGTERM first and give the process time to clean up (Kubernetes default: 30 seconds via terminationGracePeriodSeconds). Only send SIGKILL as a last resort. The mnemonic: "Ask politely (15), then execute (9)." Exit code = 128 + signal number: SIGTERM (15) = 143, SIGKILL (9) = 137.

kill 1234            # Sends SIGTERM (default)
kill -9 1234         # SIGKILL — last resort only

SIGHUP — Reload Without Restart

Modern daemons repurpose SIGHUP as "reload your configuration." This avoids dropping active connections during a restart.

kill -HUP $(cat /var/run/nginx.pid)   # nginx reloads config
kill -HUP $(pidof haproxy)            # HAProxy reloads
systemctl reload nginx                 # Same thing via systemd

When you close a terminal, the shell sends SIGHUP to all children. This is why background processes die on logout.

SIGUSR1/2, SIGSTOP/CONT, SIGCHLD, SIGPIPE

SIGUSR1/2 have no predefined meaning. Applications define their own behavior (toggle debug logging, dump state, reopen log files). dd reports progress on SIGUSR1.

SIGSTOP freezes a process (uncatchable). SIGCONT resumes it. Useful during incidents: freeze a misbehaving process to stop damage without losing its state.

SIGCHLD notifies a parent when a child exits. Parents that handle it (or call wait()) can clean up zombie children.

SIGPIPE fires when writing to a pipe whose reader has closed. Many services must handle it to avoid dying when clients disconnect.


The kill Command Family

# kill — by PID
kill PID              # SIGTERM
kill -9 PID           # SIGKILL
kill -HUP PID         # SIGHUP
kill -0 PID           # Test if alive (no signal sent)
kill -TERM -PGID      # Signal entire process group (negative PID)

# pgrep — find by pattern
pgrep nginx               # PIDs matching name
pgrep -a nginx            # PIDs + full command
pgrep -u www-data         # By user
pgrep -f "python app.py"  # Match full command line
pgrep -P 1234             # Children of PID 1234

# pkill — signal by pattern (safer than ps|grep|kill)
pkill -TERM nginx
pkill -HUP -f "gunicorn"
pkill -9 -u baduser
pkill -TERM -P 1234       # Kill children of PID 1234

Process States

State Code Meaning Impact
Running R On CPU or in run queue Normal
Sleeping (interruptible) S Waiting for event, can be signaled Normal — most processes live here
Sleeping (uninterruptible) D Waiting for I/O, cannot be signaled Cannot be killed — fix the I/O subsystem
Stopped T Paused by SIGSTOP/SIGTSTP Job control, debugging
Zombie Z Exited, parent has not called wait() Holds a PID slot, cannot be killed

Reading ps STAT Column

ps aux
# STAT column: Ss = sleeping + session leader, R+ = running + foreground
# Modifiers: s=session leader, l=multi-threaded, +=foreground, <=high priority, N=low priority

/proc/PID/status

cat /proc/1234/status | grep -E 'Name|State|PPid|Threads|VmRSS|VmSwap|ctxt'
# VmRSS = physical memory used
# High nonvoluntary_ctxt_switches = CPU-bound (being preempted)
# High voluntary_ctxt_switches = I/O-bound (frequently waiting)

nice and renice — Process Priority

Nice values range from -20 (highest priority) to +19 (lowest). Default is 0. Only root can set negative values.

nice -n 10 ./backup.sh            # Start at lower priority
nice -n -5 ./critical-task.sh     # Higher priority (root only)
renice 15 -p 1234                 # Change running process
renice 10 -u backupuser           # All processes by user
ps -o pid,ni,comm -p 1234         # Check current nice value

Process Groups and Sessions

Session (SID)
+-- Process Group 1 (foreground job)
|   +-- Process A (group leader)
|   +-- Process B
+-- Process Group 2 (background job)
|   +-- Process C (group leader)
+-- Session Leader (the shell)
ps -o pid,pgid,sid,comm          # View group and session IDs
kill -TERM -$PGID                # Signal entire group (negative PID)

Closing a terminal sends SIGHUP to the session leader, which propagates to all process groups.


Job Control

./task.sh &          # Run in background
jobs -l              # List jobs
# Ctrl+Z             # Suspend foreground (sends SIGTSTP)
bg %1                # Resume in background
fg %1                # Bring to foreground
kill %1              # Kill by job number
disown %1            # Detach from shell (survives logout)

Surviving Terminal Disconnect

# nohup — ignores SIGHUP, redirects stdout
nohup ./task.sh > /var/log/task.log 2>&1 &

# disown — detach already-running job
./task.sh > /tmp/task.log 2>&1 &
disown %1

# systemd-run — modern, managed, logged
systemd-run --unit=my-task --remain-after-exit /opt/scripts/task.sh
journalctl -u my-task -f

Gotcha: In non-interactive shells (scripts, cron, CI pipelines), job control is disabled by default. Commands like fg, bg, and jobs do not work. Background processes started with & in a script will be killed when the script exits unless you use nohup, disown, or setsid. This catches people who test scripts interactively and then wonder why background processes die in cron.

Orphan and Zombie Processes

Orphans are running processes whose parent has died. The kernel reparents them to PID 1. Not inherently bad, but they indicate a supervision gap.

Zombies are dead processes whose parent has not called wait(). They consume no resources except a PID table slot. You cannot kill them (they are already dead). Fix the parent.

# Find zombies and their parents
ps -eo pid,ppid,stat,comm | awk '$3 ~ /^Z/ {print "Zombie:", $1, "Parent:", $2}'

# Kill the parent to let PID 1 reap the zombies
kill -TERM <parent_pid>

In containers, PID 1 must reap children. Use tini or dumb-init:

ENTRYPOINT ["tini", "--"]
CMD ["python", "app.py"]

trap in Bash Scripts

#!/bin/bash
set -euo pipefail

cleanup() {
    echo "Shutting down..."
    kill $(jobs -p) 2>/dev/null
    wait $(jobs -p) 2>/dev/null
    rm -f /var/run/my-service.pid
    exit 0
}

trap cleanup SIGTERM SIGINT SIGHUP
trap '' SIGPIPE    # Ignore broken pipes

echo $$ > /var/run/my-service.pid

# Run app in background so trap can fire between waits
/usr/local/bin/my-app &
wait $!
cleanup

The key insight: trap handlers only fire when the shell is not executing a foreground command. Running the app with & and using wait lets the trap fire promptly on signal delivery.


Key Takeaways

  1. Signals are async notifications. SIGTERM asks politely, SIGKILL forces. Always SIGTERM first.
  2. SIGKILL and SIGSTOP cannot be caught. Everything else can be handled or ignored.
  3. SIGHUP means "reload config" for daemons and "terminal gone" for everything else.
  4. Process states: S is normal, D is stuck on I/O (unkillable), Z is a dead child awaiting reap.
  5. nice/renice control scheduling priority. Use them to protect production from batch work.
  6. Process groups let you signal entire pipelines. Sessions tie to terminal sessions.
  7. nohup/disown keep processes alive after logout. systemd-run is the modern alternative.
  8. Zombies cannot be killed. Fix or kill the parent and PID 1 reaps them.
  9. trap in bash implements graceful shutdown. Run children in background and use wait.

Wiki Navigation

Prerequisites