Portal | Level: L1: Foundations | Topics: Process Management, Linux Fundamentals, Bash / Shell Scripting | Domain: Linux

Process Management - Primer¶

Why This Matters¶

Every service you deploy, every container you run, every script you fire off — it is a process. When things go wrong in production, the answer is almost always hiding in process behavior: a zombie consuming a PID slot, a D-state process blocking a mount, an orphan leaking file descriptors. If you cannot read process state, you cannot debug Linux systems. Period.

Understanding process management is not about memorizing signal numbers. It is about knowing how the kernel manages work, how parent-child relationships define cleanup responsibility, and how to intervene surgically when something goes sideways.

Process Lifecycle: Fork, Exec, Wait¶

Every process in Linux begins the same way. There is no "create process" system call. Instead:

Parent Process (PID 100)
    │
    ├── fork()  ──────▶  Child Process (PID 101)
    │                     [exact copy of parent]
    │                          │
    │                          ├── exec()
    │                          │   [replaces memory with new program]
    │                          │
    │                          ├── ... does work ...
    │                          │
    │                          └── exit(status)
    │                                │
    └── wait(&status)  ◀─────────────┘
        [collects exit code, reaps child]

Under the hood: Modern Linux does not actually copy the parent's memory on fork(). It uses copy-on-write (COW): parent and child share the same physical pages, marked read-only. Only when one process writes to a page does the kernel create a private copy. This makes fork() fast even for processes using gigabytes of RAM. Redis exploits this for background saves -- fork() creates a snapshot without doubling memory usage (unless the dataset is heavily modified during the save).

fork(): Parent creates a child. Child is an almost-exact copy (same memory, file descriptors, environment). Child gets a new PID.
exec(): Child replaces itself with a new program. The PID stays the same.
exit(): Child terminates, becomes a zombie until parent calls wait().
wait(): Parent collects exit status. Zombie is reaped. PID is freed.

This is why every process has a parent. Check with:

ps -ef --forest
# or
pstree -p

The only exception is PID 1 (init/systemd), which has no parent and adopts orphans.

Signals¶

Signals are the kernel's way of poking a process. They are software interrupts.

The Signals That Matter¶

Signal	Number	Default Action	Can Catch?	Use Case
SIGHUP	1	Terminate	Yes	Reload config (daemons)
SIGINT	2	Terminate	Yes	Ctrl+C
SIGQUIT	3	Core dump	Yes	Ctrl+\ (with core dump)
SIGKILL	9	Terminate	No	Unconditional kill
SIGSEGV	11	Core dump	Yes	Segmentation fault
SIGTERM	15	Terminate	Yes	Polite shutdown request
SIGSTOP	19	Stop	No	Unconditional pause
SIGCONT	18	Continue	Yes	Resume stopped process
SIGCHLD	17	Ignore	Yes	Child state change
SIGUSR1	10	Terminate	Yes	Application-defined
SIGUSR2	12	Terminate	Yes	Application-defined

SIGTERM vs SIGKILL — This Is Not Optional Knowledge¶

SIGTERM (15):
  "Please shut down gracefully."
  - Process CAN catch it
  - Process CAN clean up (flush buffers, close connections, remove PID files)
  - Process CAN ignore it (badly behaved, but possible)

SIGKILL (9):
  "You are dead. The kernel is removing you. Now."
  - Process CANNOT catch it
  - Process CANNOT clean up
  - Kernel terminates the process immediately
  - Shared memory, temp files, locks — all left behind

Always send SIGTERM first. Wait. Only send SIGKILL if the process does not respond. This is what docker stop does (SIGTERM, then SIGKILL after 10s) and what Kubernetes does during pod termination.

Remember: Mnemonic: "TERM asks, KILL takes." SIGTERM (15) is a polite request the process can handle. SIGKILL (9) is the kernel forcibly removing the process. Only two signals cannot be caught: SIGKILL (9) and SIGSTOP (19). Everything else is advisory.

# Correct shutdown sequence
kill $PID            # Sends SIGTERM (default)
sleep 5
kill -0 $PID 2>/dev/null && kill -9 $PID   # SIGKILL only if still alive

Sending Signals¶

kill -SIGTERM 1234         # By name
kill -15 1234              # By number
kill -TERM 1234            # Short name
killall -TERM nginx        # By process name (all matching)
pkill -TERM -f "python app.py"  # By command pattern

Process States¶

Every process is in one of these states at any given moment:

┌─────────┐    fork()    ┌─────────┐
│ Created  │────────────▶│ Ready   │
└─────────┘              │  (R)    │
                         └────┬────┘
                              │ scheduled
                              ▼
                         ┌─────────┐
               ┌────────▶│ Running │◀────────┐
               │         │  (R)    │         │
               │         └────┬────┘         │
               │              │              │
          wake │    ┌─────────┼─────────┐    │ continued
               │    │         │         │    │
               │    ▼         ▼         ▼    │
          ┌─────────┐  ┌─────────┐  ┌─────────┐
          │Sleeping │  │ Stopped │  │  Zombie  │
          │ (S/D)   │  │  (T)    │  │   (Z)   │
          └─────────┘  └─────────┘  └─────────┘

State	Code	Meaning	You Care Because...
Running	R	On CPU or ready to run	Normal, healthy
Sleeping (interruptible)	S	Waiting for event, can be signaled	Normal, most processes spend time here
Sleeping (uninterruptible)	D	Waiting for I/O, CANNOT be signaled	Danger — cannot kill, usually disk/NFS
Stopped	T	Paused by signal (SIGSTOP/SIGTSTP)	Job control, debugging
Zombie	Z	Exited, waiting for parent to reap	PID leak if parent never waits
Dead	X	Being removed	Transient, rarely seen

D-State: The Unkillable Process¶

A process in D-state (uninterruptible sleep) cannot be killed — not even with SIGKILL. It is waiting for a kernel-level I/O operation to complete. Common causes:

NFS server is unreachable
Disk is failing
FUSE filesystem is hung
iSCSI target is gone

# Find D-state processes
ps aux | awk '$8 ~ /D/'

# Check what they are waiting on
cat /proc/<PID>/wchan
cat /proc/<PID>/stack

You cannot kill D-state processes. You fix the I/O subsystem they are waiting on, or you reboot.

Debug clue: A sudden spike in D-state processes is almost always a storage problem: NFS server down, SAN path failure, disk dying, or FUSE filesystem stuck. Check dmesg for I/O errors and /proc/<PID>/wchan to see which kernel function the process is blocked on. Common wchan values: nfs_wait_bit_killable (NFS), blkdev_issue_flush (disk), fuse_request_wait (FUSE).

Zombies and Orphans¶

Zombies¶

A zombie is a process that has exited but whose parent has not called wait(). The zombie holds a slot in the process table (PID, exit status) but consumes no CPU or memory.

# Find zombies
ps aux | awk '$8 == "Z"'

# See who the parent is
ps -o pid,ppid,stat,comm -p $(ps aux | awk '$8 == "Z" {print $2}')

You cannot kill a zombie. It is already dead. You kill its parent (or fix the parent so it reaps children properly). When the parent dies, the zombie is adopted by PID 1, which reaps it.

Orphans¶

An orphan is a running process whose parent has died. The kernel reparents orphans to PID 1 (init/systemd), which will eventually reap them when they exit.

Orphans are not inherently bad, but they can indicate: - A supervisor crashed without stopping its children - A script spawned background processes and exited - A container init process is not handling adoption

Job Control¶

Job control lets you manage processes from a shell session.

# Run in background
long_task &

# List jobs
jobs -l

# Suspend foreground process
# Press Ctrl+Z  (sends SIGTSTP)

# Resume in background
bg %1

# Bring to foreground
fg %1

# Kill by job number
kill %1

nohup — Surviving Logout¶

When you close a terminal, the shell sends SIGHUP to all its children. They die.

# This survives logout
nohup ./long_script.sh > /var/log/script.log 2>&1 &

# Modern alternative: use tmux or screen
tmux new-session -d -s mytask './long_script.sh'

# Or use systemd for anything that should be permanent
systemd-run --unit=my-task --remain-after-exit /path/to/script.sh

The /proc Filesystem¶

/proc is a virtual filesystem. It does not exist on disk. It is the kernel exposing process state as files.

Per-Process Information¶

# Command line that started the process
cat /proc/1234/cmdline | tr '\0' ' '

# Environment variables
cat /proc/1234/environ | tr '\0' '\n'

# Current working directory
readlink /proc/1234/cwd

# Executable path
readlink /proc/1234/exe

# Open file descriptors
ls -la /proc/1234/fd/

# File descriptor count
ls /proc/1234/fd/ | wc -l

# Memory map
cat /proc/1234/maps

# Memory usage summary
cat /proc/1234/status | grep -E 'VmSize|VmRSS|VmSwap|Threads'

# Process state
cat /proc/1234/stat | awk '{print $3}'

# Network connections (per process)
cat /proc/1234/net/tcp

# Limits
cat /proc/1234/limits

# What the process is waiting on (kernel function)
cat /proc/1234/wchan

System-Wide Information¶

# CPU info
cat /proc/cpuinfo | grep "model name" | head -1

# Memory
cat /proc/meminfo | head -10

# Load average
cat /proc/loadavg

# Uptime
cat /proc/uptime

# All mounted filesystems
cat /proc/mounts

# Kernel parameters
cat /proc/sys/kernel/pid_max
cat /proc/sys/fs/file-max

Process Trees¶

Understanding parent-child relationships is critical for debugging:

# Full process tree
pstree -p

# Process tree for a specific PID
pstree -p 1234

# Show process tree with threads
pstree -pt 1234

# ps with hierarchy
ps auxf

# Find all descendants of a process
ps --ppid 1234 --forest

In containers, PID 1 is the entrypoint. If PID 1 is not a proper init (does not reap zombies, does not forward signals), you get zombie accumulation. This is why tini and dumb-init exist:

# Use tini as PID 1 in containers
RUN apt-get install -y tini
ENTRYPOINT ["tini", "--"]
CMD ["python", "app.py"]

Process Resource Limits¶

Every process has resource limits (ulimits):

# View limits for current shell
ulimit -a

# View limits for a running process
cat /proc/1234/limits

# Key limits:
# Max open files:       ulimit -n
# Max processes:        ulimit -u
# Max memory (KB):      ulimit -v
# Core file size:       ulimit -c

Common production issues: - Too many open files — increase nofile limit - Cannot fork — hit max processes (nproc) - No core dump generated — core file size is 0

Set in /etc/security/limits.conf or systemd unit files:

# /etc/security/limits.conf
appuser  soft  nofile  65536
appuser  hard  nofile  65536

# systemd unit
[Service]
LimitNOFILE=65536
LimitNPROC=4096

Key Takeaways¶

Every process starts with fork/exec and ends with exit/wait — there are no shortcuts
Always SIGTERM before SIGKILL — give processes a chance to clean up
D-state processes cannot be killed — fix the underlying I/O problem
Zombies are not the problem — the parent that is not reaping them is the problem
/proc is the source of truth for process state — learn to read it directly
Containers need a proper init process (tini, dumb-init) or zombies accumulate
Resource limits (ulimits) cause silent failures — check them early in any debugging session

Prerequisites¶

Linux Ops (Topic Pack, L0)

Advanced Bash for Ops (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Bash / Shell Scripting, Linux Fundamentals
Environment Variables (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
LPIC / LFCS Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting, Linux Fundamentals
Linux Ops (Topic Pack, L0) — Bash / Shell Scripting, Linux Fundamentals
Linux Ops Drills (Drill, L0) — Bash / Shell Scripting, Linux Fundamentals
Pipes & Redirection (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
RHCE (EX294) Exam Preparation (Topic Pack, L2) — Bash / Shell Scripting, Linux Fundamentals
Regex & Text Wrangling (Topic Pack, L1) — Bash / Shell Scripting, Linux Fundamentals
Track: Foundations (Reference, L0) — Bash / Shell Scripting, Linux Fundamentals

Process Management - Primer¶

Why This Matters¶

Process Lifecycle: Fork, Exec, Wait¶

Signals¶

The Signals That Matter¶

SIGTERM vs SIGKILL — This Is Not Optional Knowledge¶

Sending Signals¶

Process States¶

D-State: The Unkillable Process¶

Zombies and Orphans¶

Zombies¶

Orphans¶

Job Control¶

nohup — Surviving Logout¶

The /proc Filesystem¶

Per-Process Information¶

System-Wide Information¶

Process Trees¶

Process Resource Limits¶

Key Takeaways¶

Wiki Navigation¶

Prerequisites¶

Pages that link here¶

Process Management - Primer¶

Why This Matters¶

Process Lifecycle: Fork, Exec, Wait¶

Signals¶

The Signals That Matter¶

SIGTERM vs SIGKILL — This Is Not Optional Knowledge¶

Sending Signals¶

Process States¶

D-State: The Unkillable Process¶

Zombies and Orphans¶

Zombies¶

Orphans¶

Job Control¶

nohup — Surviving Logout¶

The /proc Filesystem¶

Per-Process Information¶

System-Wide Information¶

Process Trees¶

Process Resource Limits¶

Key Takeaways¶

Wiki Navigation¶

Prerequisites¶

Related Content¶

Pages that link here¶