Portal | Level: L1: Foundations | Topics: Process Management, Linux Fundamentals, Bash / Shell Scripting | Domain: Linux
Process Management - Primer¶
Why This Matters¶
Every service you deploy, every container you run, every script you fire off — it is a process. When things go wrong in production, the answer is almost always hiding in process behavior: a zombie consuming a PID slot, a D-state process blocking a mount, an orphan leaking file descriptors. If you cannot read process state, you cannot debug Linux systems. Period.
Understanding process management is not about memorizing signal numbers. It is about knowing how the kernel manages work, how parent-child relationships define cleanup responsibility, and how to intervene surgically when something goes sideways.
Process Lifecycle: Fork, Exec, Wait¶
Every process in Linux begins the same way. There is no "create process" system call. Instead:
Parent Process (PID 100)
│
├── fork() ──────▶ Child Process (PID 101)
│ [exact copy of parent]
│ │
│ ├── exec()
│ │ [replaces memory with new program]
│ │
│ ├── ... does work ...
│ │
│ └── exit(status)
│ │
└── wait(&status) ◀─────────────┘
[collects exit code, reaps child]
Under the hood: Modern Linux does not actually copy the parent's memory on
fork(). It uses copy-on-write (COW): parent and child share the same physical pages, marked read-only. Only when one process writes to a page does the kernel create a private copy. This makesfork()fast even for processes using gigabytes of RAM. Redis exploits this for background saves --fork()creates a snapshot without doubling memory usage (unless the dataset is heavily modified during the save).
- fork(): Parent creates a child. Child is an almost-exact copy (same memory, file descriptors, environment). Child gets a new PID.
- exec(): Child replaces itself with a new program. The PID stays the same.
- exit(): Child terminates, becomes a zombie until parent calls wait().
- wait(): Parent collects exit status. Zombie is reaped. PID is freed.
This is why every process has a parent. Check with:
The only exception is PID 1 (init/systemd), which has no parent and adopts orphans.
Signals¶
Signals are the kernel's way of poking a process. They are software interrupts.
The Signals That Matter¶
| Signal | Number | Default Action | Can Catch? | Use Case |
|---|---|---|---|---|
| SIGHUP | 1 | Terminate | Yes | Reload config (daemons) |
| SIGINT | 2 | Terminate | Yes | Ctrl+C |
| SIGQUIT | 3 | Core dump | Yes | Ctrl+\ (with core dump) |
| SIGKILL | 9 | Terminate | No | Unconditional kill |
| SIGSEGV | 11 | Core dump | Yes | Segmentation fault |
| SIGTERM | 15 | Terminate | Yes | Polite shutdown request |
| SIGSTOP | 19 | Stop | No | Unconditional pause |
| SIGCONT | 18 | Continue | Yes | Resume stopped process |
| SIGCHLD | 17 | Ignore | Yes | Child state change |
| SIGUSR1 | 10 | Terminate | Yes | Application-defined |
| SIGUSR2 | 12 | Terminate | Yes | Application-defined |
SIGTERM vs SIGKILL — This Is Not Optional Knowledge¶
SIGTERM (15):
"Please shut down gracefully."
- Process CAN catch it
- Process CAN clean up (flush buffers, close connections, remove PID files)
- Process CAN ignore it (badly behaved, but possible)
SIGKILL (9):
"You are dead. The kernel is removing you. Now."
- Process CANNOT catch it
- Process CANNOT clean up
- Kernel terminates the process immediately
- Shared memory, temp files, locks — all left behind
Always send SIGTERM first. Wait. Only send SIGKILL if the process does not respond. This is what docker stop does (SIGTERM, then SIGKILL after 10s) and what Kubernetes does during pod termination.
Remember: Mnemonic: "TERM asks, KILL takes." SIGTERM (15) is a polite request the process can handle. SIGKILL (9) is the kernel forcibly removing the process. Only two signals cannot be caught: SIGKILL (9) and SIGSTOP (19). Everything else is advisory.
# Correct shutdown sequence
kill $PID # Sends SIGTERM (default)
sleep 5
kill -0 $PID 2>/dev/null && kill -9 $PID # SIGKILL only if still alive
Sending Signals¶
kill -SIGTERM 1234 # By name
kill -15 1234 # By number
kill -TERM 1234 # Short name
killall -TERM nginx # By process name (all matching)
pkill -TERM -f "python app.py" # By command pattern
Process States¶
Every process is in one of these states at any given moment:
┌─────────┐ fork() ┌─────────┐
│ Created │────────────▶│ Ready │
└─────────┘ │ (R) │
└────┬────┘
│ scheduled
▼
┌─────────┐
┌────────▶│ Running │◀────────┐
│ │ (R) │ │
│ └────┬────┘ │
│ │ │
wake │ ┌─────────┼─────────┐ │ continued
│ │ │ │ │
│ ▼ ▼ ▼ │
┌─────────┐ ┌─────────┐ ┌─────────┐
│Sleeping │ │ Stopped │ │ Zombie │
│ (S/D) │ │ (T) │ │ (Z) │
└─────────┘ └─────────┘ └─────────┘
| State | Code | Meaning | You Care Because... |
|---|---|---|---|
| Running | R | On CPU or ready to run | Normal, healthy |
| Sleeping (interruptible) | S | Waiting for event, can be signaled | Normal, most processes spend time here |
| Sleeping (uninterruptible) | D | Waiting for I/O, CANNOT be signaled | Danger — cannot kill, usually disk/NFS |
| Stopped | T | Paused by signal (SIGSTOP/SIGTSTP) | Job control, debugging |
| Zombie | Z | Exited, waiting for parent to reap | PID leak if parent never waits |
| Dead | X | Being removed | Transient, rarely seen |
D-State: The Unkillable Process¶
A process in D-state (uninterruptible sleep) cannot be killed — not even with SIGKILL. It is waiting for a kernel-level I/O operation to complete. Common causes:
- NFS server is unreachable
- Disk is failing
- FUSE filesystem is hung
- iSCSI target is gone
# Find D-state processes
ps aux | awk '$8 ~ /D/'
# Check what they are waiting on
cat /proc/<PID>/wchan
cat /proc/<PID>/stack
You cannot kill D-state processes. You fix the I/O subsystem they are waiting on, or you reboot.
Debug clue: A sudden spike in D-state processes is almost always a storage problem: NFS server down, SAN path failure, disk dying, or FUSE filesystem stuck. Check
dmesgfor I/O errors and/proc/<PID>/wchanto see which kernel function the process is blocked on. Common wchan values:nfs_wait_bit_killable(NFS),blkdev_issue_flush(disk),fuse_request_wait(FUSE).
Zombies and Orphans¶
Zombies¶
A zombie is a process that has exited but whose parent has not called wait(). The zombie holds a slot in the process table (PID, exit status) but consumes no CPU or memory.
# Find zombies
ps aux | awk '$8 == "Z"'
# See who the parent is
ps -o pid,ppid,stat,comm -p $(ps aux | awk '$8 == "Z" {print $2}')
You cannot kill a zombie. It is already dead. You kill its parent (or fix the parent so it reaps children properly). When the parent dies, the zombie is adopted by PID 1, which reaps it.
Orphans¶
An orphan is a running process whose parent has died. The kernel reparents orphans to PID 1 (init/systemd), which will eventually reap them when they exit.
Orphans are not inherently bad, but they can indicate: - A supervisor crashed without stopping its children - A script spawned background processes and exited - A container init process is not handling adoption
Job Control¶
Job control lets you manage processes from a shell session.
# Run in background
long_task &
# List jobs
jobs -l
# Suspend foreground process
# Press Ctrl+Z (sends SIGTSTP)
# Resume in background
bg %1
# Bring to foreground
fg %1
# Kill by job number
kill %1
nohup — Surviving Logout¶
When you close a terminal, the shell sends SIGHUP to all its children. They die.
# This survives logout
nohup ./long_script.sh > /var/log/script.log 2>&1 &
# Modern alternative: use tmux or screen
tmux new-session -d -s mytask './long_script.sh'
# Or use systemd for anything that should be permanent
systemd-run --unit=my-task --remain-after-exit /path/to/script.sh
The /proc Filesystem¶
/proc is a virtual filesystem. It does not exist on disk. It is the kernel exposing process state as files.
Per-Process Information¶
# Command line that started the process
cat /proc/1234/cmdline | tr '\0' ' '
# Environment variables
cat /proc/1234/environ | tr '\0' '\n'
# Current working directory
readlink /proc/1234/cwd
# Executable path
readlink /proc/1234/exe
# Open file descriptors
ls -la /proc/1234/fd/
# File descriptor count
ls /proc/1234/fd/ | wc -l
# Memory map
cat /proc/1234/maps
# Memory usage summary
cat /proc/1234/status | grep -E 'VmSize|VmRSS|VmSwap|Threads'
# Process state
cat /proc/1234/stat | awk '{print $3}'
# Network connections (per process)
cat /proc/1234/net/tcp
# Limits
cat /proc/1234/limits
# What the process is waiting on (kernel function)
cat /proc/1234/wchan
System-Wide Information¶
# CPU info
cat /proc/cpuinfo | grep "model name" | head -1
# Memory
cat /proc/meminfo | head -10
# Load average
cat /proc/loadavg
# Uptime
cat /proc/uptime
# All mounted filesystems
cat /proc/mounts
# Kernel parameters
cat /proc/sys/kernel/pid_max
cat /proc/sys/fs/file-max
Process Trees¶
Understanding parent-child relationships is critical for debugging:
# Full process tree
pstree -p
# Process tree for a specific PID
pstree -p 1234
# Show process tree with threads
pstree -pt 1234
# ps with hierarchy
ps auxf
# Find all descendants of a process
ps --ppid 1234 --forest
In containers, PID 1 is the entrypoint. If PID 1 is not a proper init (does not reap zombies, does not forward signals), you get zombie accumulation. This is why tini and dumb-init exist:
# Use tini as PID 1 in containers
RUN apt-get install -y tini
ENTRYPOINT ["tini", "--"]
CMD ["python", "app.py"]
Process Resource Limits¶
Every process has resource limits (ulimits):
# View limits for current shell
ulimit -a
# View limits for a running process
cat /proc/1234/limits
# Key limits:
# Max open files: ulimit -n
# Max processes: ulimit -u
# Max memory (KB): ulimit -v
# Core file size: ulimit -c
Common production issues:
- Too many open files — increase nofile limit
- Cannot fork — hit max processes (nproc)
- No core dump generated — core file size is 0
Set in /etc/security/limits.conf or systemd unit files:
# /etc/security/limits.conf
appuser soft nofile 65536
appuser hard nofile 65536
# systemd unit
[Service]
LimitNOFILE=65536
LimitNPROC=4096
Key Takeaways¶
- Every process starts with fork/exec and ends with exit/wait — there are no shortcuts
- Always SIGTERM before SIGKILL — give processes a chance to clean up
- D-state processes cannot be killed — fix the underlying I/O problem
- Zombies are not the problem — the parent that is not reaping them is the problem
/procis the source of truth for process state — learn to read it directly- Containers need a proper init process (tini, dumb-init) or zombies accumulate
- Resource limits (ulimits) cause silent failures — check them early in any debugging session
Wiki Navigation¶
Prerequisites¶
- Linux Ops (Topic Pack, L0)
Related Content¶
- Advanced Bash for Ops (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Bash / Shell Scripting, Linux Fundamentals
- Environment Variables (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- LPIC / LFCS Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting, Linux Fundamentals
- Linux Ops (Topic Pack, L0) — Bash / Shell Scripting, Linux Fundamentals
- Linux Ops Drills (Drill, L0) — Bash / Shell Scripting, Linux Fundamentals
- Pipes & Redirection (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- RHCE (EX294) Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting, Linux Fundamentals
- Regex & Text Wrangling (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
- Track: Foundations (Reference, L0) — Bash / Shell Scripting, Linux Fundamentals