Linux The Complete Guide

lesson
boot-process-(bios/uefi
grub
kernel
initramfs
systemd)
processes-&-signals
filesystems-&-storage-(ext4
xfs
lvm
raid)
memory-management-(virtual-memory
page-cache
swap
oom-killer)
networking-(tcp/ip
iptables
bridges
bonds
vlans)
ssh
permissions-&-security-(selinux
apparmor
hardening)
debugging-(strace
/proc
ebpf
performance)
systemd-(unit-files
targets
timers
journald)
package-management
cgroups-&-namespaces
kernel-tuning
text-processing
logging ---# Linux — The Complete Guide: From Power Button to Production Mastery

Topics: Boot process (BIOS/UEFI, GRUB, kernel, initramfs, systemd), processes & signals, filesystems & storage (ext4, XFS, LVM, RAID), memory management (virtual memory, page cache, swap, OOM killer), networking (TCP/IP, iptables, bridges, bonds, VLANs), SSH, permissions & security (SELinux, AppArmor, hardening), debugging (strace, /proc, eBPF, performance), systemd (unit files, targets, timers, journald), package management, cgroups & namespaces, kernel tuning, text processing, logging Strategy: Build-up from bare metal to production operations, with war stories, trivia, and drills throughout Level: L0–L2 (Zero → Foundations → Operations) Time: 5–6 hours (designed for deep study in one or multiple sittings) Prerequisites: Access to a Linux terminal. No prior Linux experience required — everything is explained from scratch.

The Mission¶

A rack-mount server sits in a datacenter. You press the power button. Forty-five seconds later, you SSH in, check disk space, restart a service, and deploy an application. In those 45 seconds, the machine went from no electricity to a running Linux system with a filesystem, a network stack, 200 services, and a login prompt. It executed firmware from the 1970s, a bootloader that's a small operating system, a kernel that unpacked itself from a compressed archive, a temporary filesystem that exists only in RAM, and a process manager that started everything in parallel.

By the end of this guide you'll understand every layer of that stack — from the moment electricity hits the motherboard to the moment you debug a production issue at 3 AM. This is the one document you need to go from "I type commands in a terminal" to "I understand what Linux is actually doing."

Table of Contents¶

The Boot Sequence — Power to Login
The Kernel — What Linux Actually Is
systemd — The Process Manager
Processes, Signals, and Process Control
Users, Permissions, and Ownership
The Filesystem — Everything Is a File
Storage — Disks, Partitions, LVM, RAID
Memory Management — Virtual Memory to OOM Killer
Networking Fundamentals — TCP/IP, DNS, Routing
iptables and Firewalls — Following a Packet
SSH — The Protocol That Runs Infrastructure
The /proc Filesystem — Linux's Hidden API
Debugging with strace — Reading System Calls
Performance Triage — The USE Method
Logging — journald, syslog, and Log Management
Package Management — apt, dnf, and Friends
Text Processing — grep, awk, sed, and the Pipeline
cgroups and Namespaces — Container Foundations
Security Hardening — Closing the Doors
eBPF — The Linux Superpower
Linux Distributions — Choosing and Understanding
On-Call Survival Guide
Real-World Case Studies
Glossary
Trivia and History
Flashcard Review
Drills
Cheat Sheet
Self-Assessment

Part 1: The Boot Sequence¶

You press the power button. Here's everything that happens.

Stage 1: Firmware (BIOS/UEFI)¶

The power supply stabilizes voltage and sends a "Power Good" signal (~100-500ms). The CPU begins executing from a hardwired address — the reset vector (0xFFFFFFF0 on x86). At this moment: RAM isn't initialized, no storage exists, no operating system.

BIOS (the old way, 1981–2020):

Power on → POST (Power-On Self-Test) → Read first 512 bytes (MBR) → Jump to bootloader

The MBR is exactly 512 bytes: 440 bytes of code, 64 bytes of partition table (max 4 partitions, max 2TB disk), and the 0x55AA boot signature.

Trivia: The 0x55AA signature has been the same since the original IBM PC in 1981. Its bit pattern (01010101 10101010) alternates between 0 and 1, making it unlikely to occur randomly.

UEFI (the modern way, 2005+):

Power on → POST → Read NVRAM boot entries → Load EFI application from ESP → EFI app is bootloader

Feature	BIOS	UEFI
Partition table	MBR (2TB max, 4 partitions)	GPT (9.4 ZB max, 128 partitions)
Bootloader size	440 bytes in MBR	Full binary on ESP (FAT32 partition)
Secure Boot	No	Yes — cryptographic chain of trust
Environment	16-bit, 1MB address space	32/64-bit, GiB of address space

Secure Boot chain: UEFI firmware → shimx64.efi (signed by Microsoft) → grubx64.efi (signed by distro) → vmlinuz (signed by distro). If any signature fails, boot halts.

Stage 2: GRUB — Loading the Kernel¶

GRUB2 (GRand Unified Bootloader) is a small operating system with filesystem drivers, a shell, and a scripting language.

# See current kernel command line
cat /proc/cmdline
# → BOOT_IMAGE=/vmlinuz-6.5.0-44-generic root=UUID=abc123... ro quiet splash

# See boot timing
systemd-analyze
# → Startup finished in 2.5s (firmware) + 3.1s (loader) + 1.8s (kernel) + 8.4s (userspace)

Key kernel command line parameters:

Parameter	Purpose
`root=UUID=...`	Where to find the root filesystem
`ro`	Mount root read-only initially (for fsck)
`single` or `1`	Boot to single-user (rescue) mode
`init=/bin/bash`	Skip init entirely, drop to shell (emergency)
`rd.break`	Break into initramfs shell before switch_root
`console=ttyS0,115200`	Serial console for headless servers

Gotcha: Never edit /boot/grub/grub.cfg directly — it's regenerated by update-grub. Edit /etc/default/grub instead.

Stage 3: Kernel Initialization¶

The kernel image (vmlinuz — the "z" means compressed) decompresses itself, then: 1. Detects CPU features and security mitigations 2. Builds the memory map and page tables 3. Configures interrupts 4. Enumerates PCI/PCIe devices (NICs, storage controllers, GPUs) 5. Initializes built-in drivers

# See kernel boot messages
dmesg | head -50
# → [0.000000] Linux version 6.5.0-44-generic ...
# → [0.123456] PCI: Using host bridge windows ...
# → [0.345678] nvme nvme0: pci function 0000:01:00.0

Stage 4: Initramfs — The Bridge to Root¶

The kernel needs to mount the root filesystem, but the root might be on LVM, LUKS encryption, software RAID, or an NVMe drive whose driver isn't compiled in. The initramfs (initial RAM filesystem) is a compressed CPIO archive containing just enough tools and drivers to find and mount the real root.

Initramfs (in RAM)          Real Root (on disk)
├── /init                   ├── /sbin/init → systemd
├── /bin/busybox            ├── /etc/
├── /lib/modules/           ├── /var/
└── /scripts/               └── /home/
        ↓ switch_root ↓

When initramfs can't find root:

ALERT! UUID=abc123... does not exist. Dropping to a shell!
(initramfs) _

Gotcha: If you change storage controllers (SATA→NVMe, new RAID card), rebuild initramfs before rebooting: update-initramfs -u (Debian) or dracut --force (RHEL). Otherwise: kernel panic.

Stage 5: PID 1 — systemd Takes Over¶

The kernel executes /sbin/init (symlink to systemd). PID 1 is special: - Can't be killed — kernel drops unhandled signals - If it exits, kernel panics — system is dead - Reaps orphans — cleans up processes whose parents died

# What took longest to boot?
systemd-analyze blame | head -10

# Critical path (bottleneck chain)
systemd-analyze critical-chain

Part 2: The Kernel¶

What Linux Actually Is¶

Linux is a kernel — the core of the operating system that controls CPU, memory, devices, and provides system calls. Everything else (bash, systemd, grep, nginx) is userspace software that talks to the kernel via syscalls.

[Your commands  ]  bash/zsh runs tools, pipes streams
[Userspace tools]  ps, ss, journalctl, find, ip, grep
[Libraries      ]  glibc, NSS, SSL, PAM
[Syscalls       ]  open/read/write/fork/exec/socket
[Kernel         ]  scheduler, VFS, network stack, drivers
[Hardware       ]  CPU, RAM, disk, NIC

Key Kernel Concepts¶

System calls (syscalls): The only way userspace can interact with hardware. Every file open, network connection, and process creation goes through a syscall.

Kernel modules: Drivers and features that can be loaded/unloaded without rebooting:

lsmod                           # List loaded modules
modprobe nvidia                 # Load a module
modinfo ext4                    # Module information

Kernel ring buffer: Hardware detection and driver messages from the moment the kernel starts:

dmesg -T | tail -50             # Recent kernel messages with timestamps
dmesg | grep -i error           # Find kernel errors

Kernel parameters (sysctl): Runtime-tunable kernel behavior:

sysctl -a | wc -l               # Hundreds of tunable parameters
sysctl net.ipv4.ip_forward      # Check IP forwarding
sysctl -w net.ipv4.ip_forward=1 # Enable (temporary)
# Persistent: add to /etc/sysctl.d/99-custom.conf

Part 3: systemd¶

systemd is the init system and service manager on virtually all modern Linux distributions. It replaced SysV init's sequential shell scripts with parallel, dependency-based service management.

Essential Commands¶

# Service management
systemctl status nginx          # Status + recent logs
systemctl start/stop/restart nginx
systemctl enable/disable nginx  # Boot persistence
systemctl enable --now nginx    # Enable AND start

# Finding problems
systemctl list-units --failed   # Failed services
systemctl list-units --type=service --state=running

# After editing unit files
systemctl daemon-reload

Unit Files¶

# /etc/systemd/system/myapp.service
[Unit]
Description=My Application
After=network.target postgresql.service
Wants=postgresql.service

[Service]
Type=simple
User=deploy
Group=deploy
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/bin/server --port 8080
Restart=on-failure
RestartSec=5
Environment=NODE_ENV=production

# Resource limits
MemoryMax=512M
CPUQuota=200%

# Security hardening
NoNewPrivileges=yes
PrivateTmp=yes
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/lib/myapp /var/log/myapp

[Install]
WantedBy=multi-user.target

Directive	Meaning
`After=`	Start after these units (ordering)
`Wants=`	Soft dependency — start these, but don't fail if they fail
`Requires=`	Hard dependency — fail if these fail
`Type=simple`	Process stays in foreground (most common)
`Type=forking`	Process forks and parent exits (legacy daemons)
`Restart=on-failure`	Restart if exit code is non-zero
`RestartSec=5`	Wait 5 seconds between restarts
`WantedBy=multi-user.target`	Enable means "start at boot"

Warning: network.target does NOT mean "the network is ready." It means the networking stack startup has been initiated. Use network-online.target only for services that truly need configured connectivity before starting. Most server daemons do NOT need it — they bind a socket and accept connections whenever they arrive.

Drop-in Overrides¶

Customize a unit without editing the original file:

# Create an override
systemctl edit nginx
# Creates /etc/systemd/system/nginx.service.d/override.conf

# Or manually:
mkdir -p /etc/systemd/system/nginx.service.d/
cat > /etc/systemd/system/nginx.service.d/override.conf << 'EOF'
[Service]
MemoryMax=1G
LimitNOFILE=65536
EOF
systemctl daemon-reload
systemctl restart nginx

Timers (Cron Replacement)¶

# /etc/systemd/system/backup.timer
[Unit]
Description=Run backup daily

[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true

[Install]
WantedBy=timers.target

systemctl enable --now backup.timer
systemctl list-timers                  # See all timers

journald — Structured Logging¶

journalctl -u nginx -f                 # Follow service logs
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx --since "2026-03-23 00:00" --until "2026-03-23 06:00"
journalctl -p err -b                   # Errors since boot
journalctl -k                          # Kernel messages only
journalctl --disk-usage                # Log storage used
journalctl --vacuum-size=500M          # Trim logs to 500MB
journalctl -o json-pretty -u nginx -n 1  # JSON output

Why systemd is controversial: It replaced a system (SysV init, 1983) that used simple shell scripts anyone could read. systemd is a complex binary that manages services, logging, networking, timers, hostname, locale, and more. Critics say it violates Unix philosophy ("do one thing well"). Supporters say it solved real problems: parallel boot, dependency management, process supervision, and resource isolation. The Debian vote in 2014 nearly split the project, and Devuan was forked specifically to maintain a systemd-free Debian.

Part 4: Processes and Signals¶

Process Lifecycle¶

fork() → new process (copy of parent)
exec() → replace process image with new program
wait() → parent collects child's exit status
exit() → process terminates

Every process has: - PID — unique process ID - PPID — parent process ID - UID/GID — owner - File descriptors — open files, sockets, pipes - Memory mappings — code, heap, stack, shared libraries - cgroup membership — resource limits

ps aux                                  # All processes
ps -eo pid,ppid,%cpu,%mem,cmd --sort=-%cpu | head
pstree -p                              # Process tree with PIDs

Process States¶

State	Symbol	Meaning
Running	R	Executing on CPU or runnable
Sleeping	S	Waiting for event (interruptible)
Disk sleep	D	Waiting for I/O (uninterruptible — can't be killed)
Zombie	Z	Exited but not yet reaped by parent
Stopped	T	Stopped by signal (Ctrl+Z)

Zombies: A process that has exited but whose parent hasn't called wait(). Zombies consume only a PID table entry but can exhaust the PID space.

# Find zombies
ps aux | awk '$8 == "Z"'
# Find their parent
ps -eo pid,ppid,stat,cmd | grep ' Z '

Orphans: When a parent dies, children are re-parented to PID 1 (systemd), which reaps them.

Signals¶

Signal	Number	Default Action	Purpose
SIGHUP	1	Terminate	Reload config (by convention)
SIGINT	2	Terminate	Ctrl+C
SIGQUIT	3	Core dump	Ctrl+\
SIGKILL	9	Terminate	Cannot be caught or ignored
SIGTERM	15	Terminate	Graceful shutdown (default kill)
SIGSTOP	19	Stop	Ctrl+Z (cannot be caught)
SIGCONT	18	Continue	Resume stopped process
SIGCHLD	17	Ignore	Child process state changed

Mnemonic: "1 for Hangup, 15 for Terminate, 9 for Kill." Always try SIGTERM before SIGKILL — SIGTERM allows cleanup (flush buffers, close connections). SIGKILL is instant death.

kill PID                # Sends SIGTERM (15) by default
kill -9 PID             # SIGKILL — last resort
kill -HUP PID           # Reload config (nginx, sshd)
kill -0 PID             # Test if process exists (no signal sent)
killall nginx           # Kill all processes named nginx
pkill -f "python app"   # Kill by command pattern

Part 5: Permissions¶

The Permission Model¶

-rwxr-xr-- 1 deploy www-data 4096 Mar 23 14:00 app.py
│└┬┘└┬┘└┬┘   └──┬─┘ └──┬───┘
│ │  │  │       │      │
│ │  │  │     owner  group
│ │  │  └─ other: r-- (read only)
│ │  └──── group: r-x (read + execute)
│ └─────── user:  rwx (read + write + execute)
└───────── type: - (file), d (directory), l (symlink)

For files: r = read contents, w = write contents, x = execute as program

For directories: r = list contents, w = create/delete entries, x = traverse (enter the directory)

Gotcha: A directory without x permission lets you ls the names but not cd into it or access any files inside. This catches everyone at least once.

chmod 755 file          # rwxr-xr-x
chmod 644 file          # rw-r--r--
chmod u+x file          # Add execute for user
chmod -R g+w dir/       # Recursive group write
chown user:group file
chown -R deploy:deploy /opt/app/

Special Bits¶

Bit	Octal	On Files	On Directories
SUID	4000	Run as file owner	(ignored)
SGID	2000	Run as file group	New files inherit directory's group
Sticky	1000	(ignored)	Only file owner can delete (used on /tmp)

chmod u+s /usr/bin/passwd    # SUID — runs as root
chmod g+s /shared/           # SGID — inherit group
chmod +t /tmp/               # Sticky — only owner can delete
find / -perm -4000 -ls       # Find all SUID files

umask¶

Controls default permissions for new files:

umask              # Show current mask (e.g., 0022)
# File default:    0666 - 0022 = 0644 (rw-r--r--)
# Directory default: 0777 - 0022 = 0755 (rwxr-xr-x)

ACLs and Capabilities¶

# ACLs: fine-grained permissions beyond user/group/other
getfacl file
setfacl -m u:deploy:rx file

# Capabilities: grant specific root powers without full root
getcap /usr/bin/ping
# → cap_net_raw=ep
setcap cap_net_bind_service=+ep /opt/myapp/server

Part 6: The Filesystem¶

Everything Is a File¶

In Linux, almost everything is represented as a file: regular files, directories, devices, sockets, pipes, and even kernel state (/proc, /sys).

The Directory Hierarchy¶

Path	Purpose
`/`	Root — everything starts here
`/bin`, `/usr/bin`	Essential/user binaries
`/sbin`, `/usr/sbin`	System administration binaries
`/etc`	Configuration files
`/var`	Variable data (logs, databases, mail, caches)
`/tmp`	Temporary files (often cleared on boot)
`/home`	User home directories
`/root`	Root user's home
`/proc`	Virtual filesystem — kernel/process state
`/sys`	Virtual filesystem — hardware/driver state
`/dev`	Device files
`/boot`	Kernel and bootloader
`/opt`	Optional/third-party software
`/mnt`, `/media`	Mount points

Filesystem Internals¶

Inodes: Every file has an inode — a metadata record containing mode, ownership, timestamps, size, and block pointers. The filename is stored in the directory entry, not the inode.

ls -i file                  # Show inode number
stat file                   # Full inode details
df -i                       # Inode usage per filesystem

Gotcha: df -h shows space is available, but writes fail? Check df -i — inodes might be exhausted. This happens with millions of tiny files (session stores, mail queues).

Hard links vs symlinks: - Hard link: Another name pointing to the same inode. Deleting one name doesn't affect the other. Can't cross filesystems. - Symlink: A pointer to a path. Can break if the target is deleted. Can cross filesystems.

ln file hardlink            # Hard link
ln -s file symlink          # Symbolic link

VFS — The Abstraction Layer¶

The Virtual Filesystem Switch lets Linux use the same syscalls (open, read, write) across all filesystems: ext4, XFS, tmpfs, NFS, overlayfs, procfs. Applications don't need to know which filesystem they're on.

Filesystem Types¶

Filesystem	Use Case	Max File Size	Journal	Notes
ext4	General purpose (default on Debian/Ubuntu)	16 TB	Yes	Mature, well-tested
XFS	Large files, high throughput (default on RHEL)	8 EB	Yes	Excellent at scale
Btrfs	Snapshots, checksums, compression	16 EB	CoW	Modern, more features
tmpfs	RAM-backed temporary files	RAM size	No	/tmp, /run
overlayfs	Container image layers	Varies	No	Used by Docker

Caveat: XFS can grow online but cannot be shrunk — ever. ext4 can be shrunk offline. This matters when planning storage.

Part 7: Storage¶

Block Devices and Partitions¶

lsblk                       # Block device tree
lsblk -f                    # With filesystem info
fdisk -l                    # Partition tables
blkid                       # UUID and filesystem types

LVM — Logical Volume Manager¶

LVM adds a virtualization layer between physical disks and filesystems:

Physical Disks → Physical Volumes (PV) → Volume Group (VG) → Logical Volumes (LV) → Filesystems
/dev/sda1 ──→ PV ─┐
                   ├──→ VG "data" ──→ LV "app" (ext4)
/dev/sdb1 ──→ PV ─┘               ──→ LV "logs" (xfs)

# Create
pvcreate /dev/sdb1
vgcreate data /dev/sdb1
lvcreate -L 50G -n app data
mkfs.ext4 /dev/data/app

# Extend (online!)
lvextend -L +20G /dev/data/app
resize2fs /dev/data/app         # ext4
xfs_growfs /mountpoint          # XFS

# Status
pvs                              # Physical volumes
vgs                              # Volume groups
lvs                              # Logical volumes

RAID Levels¶

Level	Disks	Redundancy	Speed	Use Case
RAID 0	2+	None	Fastest	Scratch/temp
RAID 1	2	Mirror	Read fast	Boot, small critical
RAID 5	3+	1 disk failure	Good	General purpose
RAID 6	4+	2 disk failures	Good	Large arrays
RAID 10	4+	Mirror + stripe	Excellent	Databases, high I/O

Disk Health¶

smartctl -a /dev/sda             # SMART data (health, errors, hours)
smartctl -t short /dev/sda       # Run short self-test
iostat -xz 1 5                   # I/O statistics per device
iotop                            # Per-process I/O usage

Mount Operations¶

mount /dev/sda1 /mnt             # Mount
umount /mnt                      # Unmount
mount -o remount,rw /            # Remount with different options
findmnt                          # Mount tree
cat /etc/fstab                   # Persistent mounts

Mount options that matter for security:

Option	Purpose
`noexec`	Prevent execution of binaries
`nosuid`	Ignore SUID/SGID bits
`nodev`	Ignore device files
`ro`	Read-only

LUKS (Linux Unified Key Setup) provides block-device encryption. Commonly used for full-disk encryption, unlocked during initramfs before root mount.

Part 8: Memory Management¶

The Big Picture¶

Linux intentionally uses ALL available RAM — unused RAM is wasted RAM. "Free" memory isn't a goal; healthy reclaim is.

free -h
#               total   used   free   shared  buff/cache  available
# Mem:           16G    4.2G   512M    128M      11G        11G
# Swap:          4G     0B     4G

MemAvailable (not MemFree) is what matters — it includes reclaimable cache.

Memory Types¶

Type	Purpose	Reclaimable?
Anonymous	Process heap, stack, mmap	Only to swap
Page cache	File data cached in RAM	Yes (automatically)
Slab cache	Kernel data structures (dentries, inodes)	Partially
Shared	Shared memory segments, tmpfs	Depends
Kernel	Kernel code and data	No

Virtual Memory¶

Every process gets its own virtual address space. The kernel maps virtual addresses to physical pages through page tables. This provides: - Isolation between processes - Lazy allocation (memory isn't physically allocated until used) - Copy-on-write after fork() - Memory-mapped files

cat /proc/PID/maps               # Virtual memory regions
cat /proc/PID/smaps_rollup       # Memory usage summary
pmap PID                         # Process memory map

The OOM Killer¶

When the system runs out of memory and swap, the kernel's OOM (Out Of Memory) killer selects a process to terminate based on oom_score.

# Check OOM kills
dmesg -T | grep -i "oom\|killed process"
journalctl -k | grep -i oom

# See OOM scores (higher = more likely to be killed)
cat /proc/PID/oom_score

# Protect a process from OOM killer
echo -1000 > /proc/PID/oom_score_adj

# Per-process memory usage
ps aux --sort=-%mem | head -15

Swap¶

Swap is overflow storage for when physical RAM is exhausted. Pages are moved to disk to free RAM for active use.

swapon --show                    # Active swap areas
cat /proc/swaps                  # Same
sysctl vm.swappiness             # How aggressively to swap (0-100)
# 0 = only swap to avoid OOM
# 60 = default
# Lower values prefer dropping page cache

Gotcha: Swap on SSD is fine and much faster than spinning disk. Swap on NVMe is fast enough to be nearly transparent. But ANY swapping means you're under memory pressure — investigate the cause.

Part 9: Networking Fundamentals¶

IP Configuration¶

ip addr show                     # IP addresses
ip route show                    # Routing table
ip link show                     # Network interfaces
ip neigh show                    # ARP table

# Legacy commands (still common)
ifconfig                         # IP addresses (deprecated)
route -n                         # Routing table (deprecated)

DNS¶

DNS resolution path: Application → NSS (Name Service Switch) → resolver (systemd-resolved or direct) → DNS server. Test what the system resolves (not just DNS): getent hosts example.com

dig example.com +short           # DNS lookup
dig example.com @8.8.8.8         # Query specific server
dig -x 93.184.216.34             # Reverse lookup
host example.com                 # Simple lookup
getent hosts example.com         # What the system resolves to (includes /etc/hosts)
cat /etc/resolv.conf             # DNS configuration
resolvectl status                # systemd-resolved state
getent hosts example.com         # What the system actually resolves
dig example.com +short           # Direct DNS query (bypasses NSS)

TCP/IP Debugging¶

ss -tlnp                         # Listening TCP ports with process names
ss -s                            # Socket statistics summary
ss -tn state established         # Established connections
ss -tn state time-wait | wc -l   # Count TIME_WAIT

# Connectivity testing
ping host                        # ICMP reachability
traceroute host                  # Path to host
curl -v telnet://host:port       # TCP connectivity test
nc -zv host 80                   # Quick port check
tcpdump -i eth0 port 80          # Packet capture

TCP States You Need to Know¶

State	Meaning	Concern
LISTEN	Waiting for connections	Normal for servers
ESTABLISHED	Active connection	Normal
TIME_WAIT	Connection closed, waiting to expire	High count = many short connections
CLOSE_WAIT	Remote closed, local hasn't closed yet	Bug — application not closing sockets
SYN_SENT	Connection attempt in progress	High count = upstream unreachable

Gotcha: Many CLOSE_WAIT sockets = application bug. The remote side closed the connection but your application hasn't called close(). This causes file descriptor leaks.

Part 10: Firewalls — iptables¶

The Five Chains¶

Every packet goes through netfilter hooks where iptables rules are evaluated:

Incoming → PREROUTING → Routing decision → INPUT (for this host)
                                        → FORWARD (passing through)
Outgoing ← POSTROUTING ← OUTPUT (from this host)

Chain	When	Purpose
PREROUTING	Before routing	DNAT (change destination)
INPUT	Packets for this host	Firewall: allow/deny incoming
FORWARD	Packets passing through	Router/Docker/K8s
OUTPUT	Packets from this host	Control outgoing
POSTROUTING	After routing	SNAT/MASQUERADE (change source)

Rules evaluate top to bottom. First match wins.

iptables -L -n -v --line-numbers  # List all rules
iptables -t nat -L -n -v          # NAT rules

# Basic firewall
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
iptables -A INPUT -j DROP

# Save/restore
iptables-save > /tmp/rules.bak
iptables-restore < /tmp/rules.bak

Gotcha: Adding a DROP rule before your ACCEPT rules locks you out of SSH immediately on a remote server. Always put ESTABLISHED,RELATED first, then SSH, then DROP.

History: iptables has gone through four generations: ipfwadm (1994) → ipchains (1998) → iptables (2001) → nftables (2014). iptables remains more widely used because Docker, Kubernetes, fail2ban, and UFW all generate iptables rules.

nftables — The Modern Framework¶

nftables is the official successor to iptables (Linux 3.13+, 2014). It provides a unified syntax for IPv4/IPv6/ARP filtering, replacing the separate iptables/ip6tables/arptables/ebtables commands.

nft list ruleset to see all rules
Many distros still use iptables compatibility layers (Docker, K8s, fail2ban emit iptables rules)
On modern RHEL/Fedora, firewall-cmd wraps nftables

nft list ruleset
firewall-cmd --list-all

Part 11: SSH¶

What Happens When You Type `ssh server`¶

TCP connection to port 22
Key exchange (Diffie-Hellman) — creates shared secret without transmitting it
Server authentication — host key verified against ~/.ssh/known_hosts
User authentication — public key, password, or certificate

Etymology: SSH was created in 1995 by Tatu Ylönen after a password-sniffing attack at Helsinki University. He chose port 22 because it sat between FTP (21) and Telnet (23) — the protocols SSH replaced. He emailed IANA and got the port assigned the same day.

Key Types¶

Type	Recommendation	Notes
Ed25519	Use this	Fastest, most secure, smallest keys
RSA	Legacy, still works	Needs 4096-bit for security
ECDSA	Acceptable	Ed25519 is better
DSA	Never	Deprecated (broken at 1024-bit)

ssh-keygen -t ed25519 -C "deploy@company"
ssh-copy-id user@host

SSH Config — The Secret Weapon¶

# ~/.ssh/config
Host prod-*
    ProxyJump bastion.example.com
    User deploy
    IdentityFile ~/.ssh/deploy_ed25519

Host db-primary
    HostName 10.0.2.50
    Port 2222
    User postgres

SSH Tunneling¶

# Local forward: access remote service through local port
ssh -L 8080:localhost:80 user@remote
# Now localhost:8080 → remote's localhost:80

# Remote forward: expose local service to remote
ssh -R 8080:localhost:3000 user@remote

# SOCKS proxy: tunnel all traffic
ssh -D 1080 user@remote

# ProxyJump: SSH through bastion
ssh -J bastion.example.com internal-server

Agent Forwarding¶

eval $(ssh-agent)
ssh-add ~/.ssh/id_ed25519
ssh -A bastion                   # Forward agent to bastion
# Now from bastion, you can SSH to internal hosts using your local key

Security warning: Agent forwarding on untrusted hosts lets root on that host use your key. Prefer ProxyJump instead.

Part 12: The /proc Filesystem¶

/proc is a virtual filesystem that exposes kernel state as files. Every debugging tool (ps, top, free, lsof) reads from /proc.

Per-Process: /proc/PID/¶

cat /proc/$$/cmdline | tr '\0' ' '       # Command line
ls -la /proc/$$/exe                       # Binary path
ls -la /proc/$$/cwd                       # Working directory
cat /proc/$$/environ | tr '\0' '\n'       # Environment variables
cat /proc/$$/status                       # State, memory, threads
ls -la /proc/$$/fd/                       # Open file descriptors
cat /proc/$$/maps                         # Memory regions

Gotcha: /proc/PID/environ shows ALL environment variables — including DATABASE_URL and API_KEY. Anyone with access (same user or root) can read your secrets.

System-Wide¶

cat /proc/meminfo                # Memory details
cat /proc/cpuinfo                # CPU information
cat /proc/loadavg                # Load averages
cat /proc/uptime                 # Uptime in seconds
cat /proc/net/tcp                # TCP connections (hex)
cat /proc/sys/kernel/pid_max     # Max PID value

Practical: Deleted Files Still Using Space¶

# Find files deleted from disk but still held open by processes
find /proc/*/fd -ls 2>/dev/null | grep deleted
# The space won't be freed until the process closes the file or exits

Part 13: Debugging with strace¶

strace shows every system call a process makes — every file opened, byte read/written, network connection created.

# Trace a running process
strace -p 12345

# Trace with timing
strace -p 12345 -t -T
# -t = timestamp, -T = time spent in each syscall

# Trace specific syscalls
strace -e trace=open,read,write ./myapp
strace -e trace=network ./myapp
strace -e trace=file ./myapp

# Follow child processes
strace -f ./deploy.sh

Pattern: The Stuck Process¶

strace -p 12345
# → read(5, [hangs here]
# Process is blocked on read from fd 5
ls -la /proc/12345/fd/5
# → socket:[89012] — waiting on a database response

Pattern: The Slow Startup¶

strace -T -e trace=open,connect ./myapp 2>&1 | sort -t'<' -k2 -rn | head
# → connect(3, {...5432...}) = 0 <5.012>   ← 5 seconds to database!

Pattern: Permission Denied¶

strace -e trace=open,stat,access ./myapp 2>&1 | grep EACCES
# → openat(AT_FDCWD, "/var/lib/myapp/data.db", O_RDWR) = -1 EACCES

Part 14: Performance Triage¶

The USE Method¶

For each resource, check Utilization, Saturation, Errors:

Resource	Utilization	Saturation	Errors
CPU	`uptime` (load avg), `mpstat`	Run queue length	`dmesg`
Memory	`free -h`, `vmstat`	Swap activity, OOM	`dmesg \\| grep oom`
Disk	`iostat -xz`, `df -h`	`await` in iostat	`dmesg \\| grep error`
Network	`sar -n DEV`, `ss -s`	Overflows, drops	`nstat`, `ip -s link`

Quick Triage Sequence¶

uptime                           # Load averages
dmesg -T | tail -20              # Recent kernel messages
free -h                          # Memory
df -h                            # Disk space
df -i                            # Inodes
iostat -xz 1 3                   # Disk I/O
ss -s                            # Socket summary
ps aux --sort=-%cpu | head -10   # Top CPU consumers
ps aux --sort=-%mem | head -10   # Top memory consumers

Load Average Decoded¶

uptime
# → load average: 4.50, 3.20, 2.10
#                 1min  5min  15min

Load average = number of runnable + uninterruptibly sleeping processes. Compare to CPU count (nproc): - Load < CPU count: system has headroom - Load = CPU count: fully utilized - Load > 2× CPU count: significant saturation

Gotcha: High load with low CPU% often means I/O wait — processes blocked on disk. Check iostat -xz 1.

Part 15: Logging¶

Log Locations¶

/var/log/syslog (or /var/log/messages)  — System log
/var/log/auth.log                        — Authentication events
/var/log/kern.log                        — Kernel messages
/var/log/nginx/access.log               — Web server access
/var/log/nginx/error.log                — Web server errors

journalctl Essentials¶

journalctl -u nginx -f                  # Follow service logs
journalctl -u nginx --since "1 hour ago"
journalctl -p err -b                    # Errors since boot
journalctl -xe                          # Recent errors with context
journalctl -k                           # Kernel messages
journalctl --vacuum-size=500M           # Trim logs

logrotate¶

# /etc/logrotate.d/myapp
/var/log/myapp/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    postrotate
        systemctl reload myapp
    endscript
}

Part 16: Package Management¶

Debian/Ubuntu (apt/dpkg)¶

apt update                       # Refresh package index
apt upgrade                      # Upgrade all packages
apt install nginx                # Install
apt remove nginx                 # Remove (keep config)
apt purge nginx                  # Remove with config
apt search keyword               # Search
dpkg -l | grep nginx             # Check installed
dpkg -L nginx                    # List files from package
apt-cache policy nginx           # Version/repo info

RHEL/Fedora/CentOS (dnf/rpm)¶

dnf install nginx
dnf remove nginx
dnf search keyword
dnf list installed | grep nginx
rpm -qa | grep nginx             # Check installed
rpm -ql nginx                    # List files
dnf info nginx                   # Package details

Part 17: Text Processing¶

The Pipeline Philosophy¶

# Chain tools with pipes
cat access.log | grep "500" | awk '{print $1}' | sort | uniq -c | sort -rn | head -10

grep — Find Lines¶

grep "ERROR" /var/log/syslog
grep -i "error" file             # Case-insensitive
grep -r "TODO" /src/             # Recursive
grep -c "500" access.log         # Count matches
grep -v "DEBUG" file             # Invert (exclude)
grep -E "error|warning" file     # Extended regex (OR)
grep -A 3 "FATAL" file           # 3 lines after match
grep -B 2 "FATAL" file           # 2 lines before

awk — Field Processing¶

awk '{print $1}' access.log                    # First field
awk -F: '{print $1, $7}' /etc/passwd           # Custom delimiter
awk '$9 >= 500 {print $1, $7, $9}' access.log  # Filter by field value
awk '{sum+=$10} END {print sum}' access.log     # Sum a column

sed — Stream Editing¶

sed 's/old/new/g' file           # Replace all occurrences
sed -i 's/old/new/g' file        # In-place edit
sed -n '10,20p' file             # Print lines 10-20
sed '/pattern/d' file            # Delete matching lines

Other Essential Tools¶

sort file                        # Sort lines
sort -rn file                    # Reverse numeric sort
uniq -c                          # Count duplicates (requires sorted input)
wc -l file                       # Count lines
cut -d: -f1 /etc/passwd          # Extract fields
tr 'a-z' 'A-Z'                  # Translate characters
head -20 file                    # First 20 lines
tail -f file                     # Follow file growth
tee file                         # Write to file AND stdout
xargs                            # Build commands from stdin

Part 18: cgroups and Namespaces¶

cgroups — Resource Control¶

cgroups limit, account for, and isolate resource usage (CPU, memory, I/O) of process groups. This is the foundation of container resource limits.

Note: Modern distributions use cgroup v2 (unified hierarchy) by default. cgroup v1 used separate hierarchies per controller. The commands shown here work on both, but systemd on modern kernels uses v2 exclusively.

# See cgroup hierarchy
systemd-cgtop                    # Live cgroup resource usage
cat /sys/fs/cgroup/memory/docker/CONTAINER_ID/memory.usage_in_bytes
cat /proc/PID/cgroup             # Which cgroup a process belongs to

# systemd sets cgroups via unit file directives:
# MemoryMax=512M
# CPUQuota=200%

Namespaces — Isolation¶

Namespaces provide isolated views of system resources:

Namespace	Isolates	Container Use
PID	Process IDs	Container sees own PID 1
Network	Network stack	Container gets own IP
Mount	Filesystem mounts	Container sees own root
UTS	Hostname	Container has own hostname
User	UID/GID mappings	Rootless containers
IPC	IPC objects	Isolated shared memory

# Create a network namespace
ip netns add test
ip netns exec test ip addr show
# → only loopback exists in this namespace

# See namespaces of a process
ls -la /proc/PID/ns/

cgroups + namespaces = containers. Docker, Kubernetes, and LXC all use these kernel features.

Part 19: Security Hardening¶

SSH Hardening¶

# /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
AllowUsers deploy admin
MaxAuthTries 3
ClientAliveInterval 300
ClientAliveCountMax 2

Firewall Basics¶

# UFW (Uncomplicated Firewall — Ubuntu)
ufw enable
ufw default deny incoming
ufw allow ssh
ufw allow 80/tcp
ufw allow 443/tcp
ufw status verbose

SELinux (RHEL/CentOS)¶

getenforce                       # Current mode: Enforcing/Permissive/Disabled
setenforce 0                     # Set permissive (temporary)
ausearch -m avc -ts recent       # Recent denials
sealert -a /var/log/audit/audit.log  # Human-readable alerts
restorecon -Rv /var/www/         # Fix file contexts

Kernel Hardening (sysctl)¶

# /etc/sysctl.d/99-hardening.conf
net.ipv4.conf.all.rp_filter = 1          # Reverse path filtering
net.ipv4.conf.all.accept_redirects = 0   # Ignore ICMP redirects
net.ipv4.conf.all.send_redirects = 0
net.ipv4.tcp_syncookies = 1              # SYN flood protection
kernel.dmesg_restrict = 1                # Restrict dmesg access
fs.protected_hardlinks = 1               # Prevent hardlink attacks
fs.protected_symlinks = 1                # Prevent symlink attacks

Audit¶

auditctl -w /etc/passwd -p wa -k passwd_changes  # Watch file changes
ausearch -k passwd_changes                         # Search audit log

Part 20: eBPF — The Linux Superpower¶

eBPF lets you run sandboxed programs inside the Linux kernel without changing kernel code or loading kernel modules. It's used for networking (Cilium), security (Falco, Tetragon), and observability (bpftrace, bcc).

# bpftrace one-liners
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'
# → Trace every file open across the entire system

bpftrace -e 'tracepoint:syscalls:sys_enter_connect { printf("%s connecting\n", comm); }'
# → Trace every network connection

# bcc tools (pre-built eBPF tools)
execsnoop                        # Trace new processes
opensnoop                        # Trace file opens
biolatency                       # Block I/O latency histogram
tcpconnect                       # Trace outbound TCP connections

History: BPF (Berkeley Packet Filter) was created in 1992 for packet filtering. In 2014, Alexei Starovoitov extended it into eBPF — a general-purpose in-kernel virtual machine. The kernel verifier ensures eBPF programs can't crash the kernel, loop forever, or access invalid memory.

Part 21: Linux Distributions¶

Distro Family	Examples	Package Manager	Default FS	Use Case
Debian	Debian, Ubuntu, Mint	apt/dpkg	ext4	Servers, desktops, cloud
Red Hat	RHEL, CentOS, Fedora, Rocky, Alma	dnf/rpm	XFS	Enterprise, compliance
Arch	Arch, Manjaro	pacman	ext4	Rolling release, DIY
Alpine	Alpine	apk	ext4	Containers (tiny, musl)
SUSE	openSUSE, SLES	zypper/rpm	Btrfs	Enterprise, snapshots

Key difference: Debian-family uses /etc/apt/, .deb packages, systemctl. Red Hat-family uses /etc/yum.repos.d/, .rpm packages, systemctl. The core Linux is the same — differences are in packaging, default configs, and support models.

Part 22: On-Call Survival Guide¶

Disk Full¶

df -h                                          # Which filesystem is full?
du -sh /var/log/* | sort -rh | head -10        # Biggest log directories
journalctl --vacuum-size=500M                  # Trim journal
find /var -xdev -type f -size +100M            # Large files
lsof +L1                                       # Deleted but still open files

OOM Killer¶

dmesg -T | grep -i "oom\|killed process"       # What was killed?
free -h                                         # Current memory state
ps aux --sort=-%mem | head -15                  # Memory hogs

Service Failed¶

systemctl status SERVICE                        # State + recent logs
journalctl -u SERVICE -n 50 --no-pager          # Full error output
ss -tlnp | grep PORT                            # Port conflict?
systemctl restart SERVICE                       # Try restart

High Load¶

uptime && nproc                                 # Load vs CPU count
top -bn1 | head -25                             # Process overview
iostat -xz 1 3                                  # I/O wait?
iotop -a -b -n 3 | head -20                     # Which process?

Safe vs Dangerous Actions¶

Safe (do without asking)	Dangerous (get approval)
Read df, top, ps, dmesg, free	kill -9 any process
journalctl (read logs)	Restart critical services
lsof, ss (read sockets)	Delete files to free space
Journal vacuum	docker/crictl prune
systemctl status	Reboot the host

Part 23: Real-World Case Studies¶

Case 1: OOM Killer Takes Down the App¶

Symptom: Java application crashes every few hours. No application error logs.

Investigation: dmesg | grep oom reveals the kernel OOM killer terminating the Java process. The JVM heap was configured at 4GB on a 4GB host — leaving no room for the kernel, page cache, or other processes.

Fix: Set JVM heap to 75% of available memory. Add MemoryMax= to the systemd unit. Monitor with /proc/PID/status VmRSS.

Case 2: Disk "Full" but df Shows Space¶

Symptom: Application can't create new files. df -h shows 60% used.

Investigation: df -i shows 100% inode usage. The mail spool (/var/spool/mail/) contained 2 million tiny files — one per unread notification. Each file used one inode even though it was only a few bytes.

Fix: Clean up mail spool. Move to a filesystem with more inodes or use mkfs -N to specify inode count.

Case 3: Zombie Processes Filling PID Space¶

Symptom: fork() fails with EAGAIN. New processes can't start.

Investigation: ps aux | grep Z | wc -l shows 15,000 zombie processes. A poorly written monitoring script spawned child processes but never called wait(). Dead children accumulated as zombies until PID space was exhausted.

Fix: Fix the parent process to reap children. Kill the parent (orphaned zombies are adopted and reaped by PID 1). Increase kernel.pid_max as temporary relief.

Case 4: systemd Service Flapping¶

Symptom: Service starts, runs for 2 seconds, crashes, restarts, repeats. Eventually hits start-limit-hit.

Investigation: journalctl -u myapp shows the app exits immediately with "config file not found." The config file exists at /etc/myapp/config.yaml, but systemd runs the service with WorkingDirectory=/opt/myapp, and the app uses a relative path ./config.yaml.

Fix: Use absolute paths in the app config, or set WorkingDirectory= correctly. Reset the start limit: systemctl reset-failed myapp.

Case 5: Runaway Logs Fill Root Disk¶

Symptom: System becomes unresponsive. SSH login is slow, commands fail with "No space left on device."

Investigation: df -h shows / at 100%. du -sh /var/log/* | sort -rh reveals a 45GB access.log. Logrotate was configured but the cron job wasn't running (cron service was disabled during a security audit and never re-enabled).

Fix: truncate -s 0 /var/log/nginx/access.log (safer than rm — avoids the deleted-but-open-file problem). Re-enable cron. Separate /var on its own partition to prevent log growth from bricking the root filesystem.

Case 6: Kernel Soft Lockup¶

Symptom: dmesg shows BUG: soft lockup - CPU#3 stuck for 22s! Intermittent system freezes.

Investigation: A kernel module (buggy storage driver) holds a spinlock for too long, preventing the soft lockup watchdog from running. The kernel reports it but doesn't crash (soft lockup, not hard lockup).

Fix: Update or replace the problematic driver. As temporary mitigation, increase kernel.softlockup_panic threshold or disable the affected hardware.

Glossary¶

Term	Definition
Kernel	Core of the OS — controls CPU, memory, devices, provides syscalls
Syscall	Interface between userspace and kernel (open, read, write, fork, exec)
PID	Unique process identifier
PID 1	Init process (systemd) — parent of all processes, kernel panics if it exits
Inode	File metadata record (permissions, timestamps, block pointers) — not the filename
File descriptor (FD)	Number referencing an open file/socket/pipe (0=stdin, 1=stdout, 2=stderr)
VFS	Virtual Filesystem Switch — unified interface across all filesystem types
Page cache	RAM used to cache file data — automatically reclaimed under pressure
cgroup	Control group — limits CPU, memory, I/O for a group of processes
Namespace	Isolation boundary (PID, network, mount, user) — foundation of containers
Unit	systemd-managed object: service, socket, timer, mount, target
Target	systemd equivalent of runlevel — group of units (multi-user.target, graphical.target)
Initramfs	Temporary RAM filesystem for finding and mounting the real root
GRUB	GRand Unified Bootloader — loads the kernel from disk
ESP	EFI System Partition — FAT32 partition with bootloader binaries
MBR	Master Boot Record — 512-byte boot sector (legacy, max 2TB)
GPT	GUID Partition Table — modern, supports 128 partitions and huge disks
LVM	Logical Volume Manager — virtualization layer between disks and filesystems
RAID	Redundant Array of Independent Disks — mirroring/striping for reliability/speed
Swap	Disk area used as overflow when RAM is exhausted
OOM killer	Kernel mechanism that kills processes when memory is completely exhausted
SIGTERM	Signal 15 — graceful shutdown request (catchable)
SIGKILL	Signal 9 — immediate termination (cannot be caught)
Zombie	Process that has exited but parent hasn't called wait() — consumes only PID entry
Orphan	Process whose parent died — adopted by PID 1
iowait	CPU time spent waiting for I/O completion — suggests storage bottleneck
Load average	Number of runnable + uninterruptibly sleeping processes
umask	Default permission mask for newly created files
SUID	Set User ID — executable runs as file owner (e.g., passwd runs as root)
SELinux	Security-Enhanced Linux — mandatory access control system
AppArmor	Application Armor — path-based mandatory access control
eBPF	Extended Berkeley Packet Filter — sandboxed kernel programs for observability
strace	Traces system calls between a process and the kernel
iptables	Packet filtering framework using netfilter hooks
SSH	Secure Shell — encrypted protocol for remote access (port 22)
TOFU	Trust On First Use — SSH's security model for host verification
TTY	Terminal device — from TeleTYpewriter, now virtual terminal

Trivia and History¶

BIOS survived 40 years. Created by Gary Kildall for CP/M in 1975, adopted by IBM in 1981. UEFI didn't fully replace it until around 2020.
GRUB is a small operating system. It has filesystem drivers, a shell, a scripting language, and a network stack. The name is a physics reference — "Grand Unified" like a Grand Unified Theory.
The kernel decompresses itself. vmlinuz (the "z" = compressed) contains a decompression stub that unpacks the real kernel. The compressed image is ~12MB; uncompressed is 30-50MB.
PID 1 is unkillable. The kernel drops unhandled signals sent to PID 1. If PID 1 exits, the kernel panics. In containers, this causes problems — docker stop sends SIGTERM, but if your app is PID 1 and doesn't handle it, Docker waits 10 seconds then SIGKILLs.
systemd is the most controversial Linux project. Announced by Lennart Poettering in 2010, it replaced SysV init (1983). The Debian vote in 2014 nearly split the project. Devuan was forked specifically to maintain systemd-free Debian.
The rc in rc.d stands for "run commands." From AT&T System V Unix (1983). The S01/S02 numbering convention used shell globbing for sequencing — it worked for 30 years.
systemd can boot in under 2 seconds. Parallel service startup, socket activation, and aggressive dependency management on SSD hardware. systemd-analyze shows exactly where time is spent.
SSH was born from a password-sniffing attack. Tatu Ylönen wrote it in 1995 after thousands of plaintext Telnet/FTP passwords were captured at Helsinki University. Port 22 was unassigned and sat between FTP (21) and Telnet (23).
Ed25519 was designed by the SYN cookies inventor. Daniel J. Bernstein designed it to resist timing side-channel attacks. Public key is 68 characters vs 372+ for RSA.
The 0x55AA boot signature is from 1981. Every x86 machine still checks for these two bytes at offset 510-511 to decide if a disk is bootable.
iptables has four generations. ipfwadm (1994) → ipchains (1998) → iptables (2001) → nftables (2014). iptables remains dominant because Docker, Kubernetes, fail2ban, and UFW all generate iptables rules.
eBPF started as a packet filter in 1992. Berkeley Packet Filter was extended in 2014 by Alexei Starovoitov into a general-purpose in-kernel virtual machine. Now used for networking (Cilium), security (Falco), and observability (bpftrace).
Linux uses ALL your RAM on purpose. Unused RAM is wasted RAM. Linux fills it with page cache (file data). MemAvailable in free -h is what actually matters, not MemFree.
/proc was just for processes originally. The name means "process." Linux stuffed more and more kernel state into it over time. Plan 9 (Bell Labs, 1992) took the concept further — everything is a file, including the CPU and network stack.
Inodes run out before disk space. A filesystem with 0% disk used but 100% inodes used refuses to create new files. This happens with millions of tiny files (session stores, mail queues).

Flashcard Review¶

Boot and Kernel¶

Q	A
What are the boot stages in order?	Firmware (BIOS/UEFI) → GRUB → Kernel → Initramfs → PID 1 (systemd)
What is initramfs for?	Temporary RAM filesystem with drivers/tools to find and mount the real root
What happens if PID 1 exits?	Kernel panic — system halts
What is the kernel command line?	Parameters passed to the kernel by GRUB (`cat /proc/cmdline`)
How do you boot to rescue mode?	Edit GRUB entry, add `systemd.unit=rescue.target` or `single`

Processes and Signals¶

Q	A
SIGTERM vs SIGKILL?	SIGTERM (15) = graceful, catchable. SIGKILL (9) = immediate, uncatchable
What is a zombie process?	Exited process whose parent hasn't called wait() — consumes only a PID entry
What does `kill -0 PID` do?	Tests if process exists without sending a signal
What does load average measure?	Number of runnable + uninterruptibly sleeping processes

Filesystems and Storage¶

Q	A
What is an inode?	File metadata record (permissions, timestamps, block pointers). Filename is in the directory entry.
`df` shows space but writes fail — why?	Inode exhaustion (`df -i`), read-only remount, or quota
Hard link vs symlink?	Hard link = same inode, can't cross filesystems. Symlink = path pointer, can break
What does LVM add?	Virtualization layer: resize, snapshot, span disks without filesystem changes

Memory¶

Q	A
What matters more: MemFree or MemAvailable?	MemAvailable — includes reclaimable cache
What is the OOM killer?	Kernel kills processes when memory + swap are exhausted
What does swap do?	Moves inactive memory pages to disk to free RAM

Networking and Security¶

Q	A
Many CLOSE_WAIT sockets — what's wrong?	Application bug — not closing connections after remote closes
iptables rule order — what matters?	First match wins. Put ACCEPT before DROP
SSH key type to use?	Ed25519 — fastest, most secure, smallest keys
`yaml.safe_load()` vs `yaml.load()`?	safe_load prevents code execution — always use it
What does `noexec` mount option do?	Prevents execution of binaries on that filesystem

Debugging¶

Q	A
Process stuck — first tool?	`strace -p PID` to see what syscall it's blocked on
High load, low CPU — what to check?	I/O wait: `iostat -xz 1`
Quick triage sequence?	`uptime` → `dmesg` → `free` → `df` → `iostat` → `ss` → `top`
What does `/proc/PID/fd/` show?	All open file descriptors (files, sockets, pipes)

Drills¶

Drill 1: /proc Exploration (Easy)¶

Q: Find the command line, working directory, and environment variables of your current shell.

Answer

cat /proc/$$/cmdline | tr '\0' ' '
ls -la /proc/$$/cwd
cat /proc/$$/environ | tr '\0' '\n' | head
cat /proc/$$/status | grep -E "Name|State|Pid|VmRSS"

Drill 2: Open File Descriptors (Easy)¶

Q: Find all open file descriptors for a process. Find deleted-but-still-open files system-wide.

Answer

ls -la /proc/PID/fd/
ls /proc/PID/fd/ | wc -l
# Deleted files still holding disk space:
find /proc/*/fd -ls 2>/dev/null | grep deleted

Drill 3: Socket States (Easy)¶

Q: List all listening TCP ports with process names. Count TIME_WAIT connections.

Answer

ss -tlnp
ss -tn state time-wait | wc -l
ss -tn state close-wait          # Bug indicator
ss -s                            # Summary

Drill 4: Find Disk Hogs (Easy)¶

Q: Find what's consuming disk space. Check inode usage.

Answer

df -h
du -sh /* 2>/dev/null | sort -rh | head -10
du -sh /var/log/* | sort -rh | head -10
df -i                            # Inode usage
find /var -xdev -type f -size +100M   # Large files

Drill 5: systemd Override (Medium)¶

Q: Add memory limits to nginx without editing the original unit file.

Answer

systemctl edit nginx
# Add:
# [Service]
# MemoryMax=1G
# LimitNOFILE=65536

# Or manually:
mkdir -p /etc/systemd/system/nginx.service.d/
echo -e "[Service]\nMemoryMax=1G" > /etc/systemd/system/nginx.service.d/override.conf
systemctl daemon-reload
systemctl restart nginx
systemctl show nginx -p MemoryMax

Drill 6: journalctl Filtering (Medium)¶

Q: Find all errors from nginx in the last hour. Show kernel OOM messages since boot.

Answer

journalctl -u nginx -p err --since "1 hour ago"
journalctl -k | grep -i oom
journalctl -u nginx -o json-pretty -n 1

Drill 7: strace a Stuck Process (Medium)¶

Q: A process is using 0% CPU but is listed as running. Diagnose what it's waiting for.

Answer

strace -p PID
# Shows the syscall it's blocked on (e.g., read, connect, futex)
# Check what the file descriptor points to:
ls -la /proc/PID/fd/N
# If it's a socket, find the remote:
ss -tnp | grep PID

Drill 8: Process Tree and Zombies (Medium)¶

Q: Display the process tree. Find zombie processes and their parents.

Answer

pstree -p
ps aux | awk '$8 == "Z" {print $0}'
# Find parent of zombies:
ps -eo pid,ppid,stat,cmd | awk '$3 ~ /Z/ {print "Zombie PID:", $1, "Parent:", $2}'

Drill 9: cgroup Inspection (Medium)¶

Q: Check what cgroup a Docker container belongs to and its resource limits.

Answer

# Find container's cgroup
docker inspect CONTAINER --format '{{.HostConfig.Memory}}'
cat /proc/$(docker inspect CONTAINER --format '{{.State.Pid}}')/cgroup
systemd-cgtop                    # Live resource usage

Drill 10: Performance Triage (Hard)¶

Q: A server has load average 25 but only 4 CPUs. Diagnose whether it's CPU-bound or I/O-bound.

Answer

uptime && nproc                  # Load 25 vs 4 CPUs = 6x overloaded
iostat -xz 1 3                  # Check %iowait and await
# High iowait → disk bottleneck:
iotop -a -b -n 3 | head -20     # Which process?
# Low iowait → CPU contention:
ps aux --sort=-%cpu | head -10   # Who's consuming CPU?
mpstat -P ALL 1 3                # Per-CPU breakdown

Cheat Sheet¶

Process Management¶

ps aux                          # All processes
pgrep -f pattern                # Find by name
kill -15 PID                    # Graceful (SIGTERM)
kill -9 PID                     # Force (SIGKILL)
kill -0 PID                     # Check alive

systemd¶

systemctl status/start/stop/restart SERVICE
systemctl enable/disable SERVICE
systemctl list-units --failed
systemctl daemon-reload
journalctl -u SERVICE -f
journalctl -p err -b

Disk & Memory¶

df -h / df -i                   # Space / inodes
du -sh /* | sort -rh | head     # Biggest dirs
free -h                         # Memory
dmesg -T | grep oom             # OOM kills

Network¶

ss -tlnp                        # Listening ports
ip addr show                    # IP addresses
ip route show                   # Routes
dig domain +short               # DNS

Performance¶

uptime                          # Load average
iostat -xz 1 3                  # Disk I/O
vmstat 1 5                      # Memory/CPU/IO
top -bn1 | head -20             # Process overview

Permissions¶

chmod 755 file                  # rwxr-xr-x
chown user:group file
find / -perm -4000 -ls          # SUID files

Quick Triage Chain¶

systemctl status → journalctl -u → ss -tlnp → df -h → free -h → top → iostat

Self-Assessment¶

Boot and Kernel¶

I can explain the 5 boot stages (firmware → GRUB → kernel → initramfs → systemd)
I know what initramfs does and when to rebuild it
I understand PID 1's special role
I can use dmesg and systemd-analyze to debug boot issues

Processes and Services¶

I understand process states (R, S, D, Z, T)
I know the difference between SIGTERM and SIGKILL
I can write and manage systemd unit files
I can use drop-in overrides and timers
I can diagnose zombie processes

Filesystems and Storage¶

I understand inodes, hard links, and symlinks
I can diagnose disk full vs inode exhaustion
I can use LVM to create and extend volumes
I know the differences between ext4, XFS, and Btrfs

Memory and Performance¶

I understand MemAvailable vs MemFree
I know what the OOM killer does and how to investigate it
I can use the USE method for performance triage
I can distinguish I/O-bound from CPU-bound load

Networking and Security¶

I can read iptables rules and understand chain order
I can diagnose TCP state issues (CLOSE_WAIT, TIME_WAIT)
I can set up SSH key authentication and tunnels
I understand file permissions, SUID, and umask
I know basic hardening steps (SSH, firewall, sysctl)

Debugging¶

I can use strace to diagnose stuck/slow processes
I can navigate /proc to inspect process state
I can use journalctl to find service errors
I can perform a quick performance triage in 60 seconds

What Happens When You Press the Power Button — Deep dive into the boot sequence
From Init Scripts to systemd — History and evolution of process management
SSH Is More Than You Think — SSH protocol, tunneling, certificates
iptables: Following a Packet Through the Chains — Netfilter deep dive
The /proc Filesystem: Linux's Hidden API — Kernel state inspection
strace: Reading the Matrix — System call debugging
eBPF: The Linux Superpower — In-kernel observability
Linux Storage, LVM, Filesystems and Beyond — Storage stack
Linux Networking: Bridges, Bonds, and VLANs — Advanced networking
Linux Hardening: Closing the Doors — Security practices

Linux The Complete Guide

The Mission¶

Table of Contents¶

Part 1: The Boot Sequence¶

Stage 1: Firmware (BIOS/UEFI)¶

Stage 2: GRUB — Loading the Kernel¶

Stage 3: Kernel Initialization¶

Stage 4: Initramfs — The Bridge to Root¶

Stage 5: PID 1 — systemd Takes Over¶

Part 2: The Kernel¶

What Linux Actually Is¶

Key Kernel Concepts¶

Part 3: systemd¶

Essential Commands¶

Unit Files¶

Drop-in Overrides¶

Timers (Cron Replacement)¶

journald — Structured Logging¶

Part 4: Processes and Signals¶

Process Lifecycle¶

Process States¶

Signals¶

Part 5: Permissions¶

The Permission Model¶

Special Bits¶

umask¶

ACLs and Capabilities¶

Part 6: The Filesystem¶

Everything Is a File¶

The Directory Hierarchy¶

Filesystem Internals¶

VFS — The Abstraction Layer¶

Filesystem Types¶

Part 7: Storage¶

Block Devices and Partitions¶

LVM — Logical Volume Manager¶

RAID Levels¶

Disk Health¶

Mount Operations¶

Part 8: Memory Management¶

The Big Picture¶

Memory Types¶

Virtual Memory¶

The OOM Killer¶

Swap¶

Part 9: Networking Fundamentals¶

IP Configuration¶

DNS¶

TCP/IP Debugging¶

TCP States You Need to Know¶

Part 10: Firewalls — iptables¶

The Five Chains¶

nftables — The Modern Framework¶

Part 11: SSH¶

What Happens When You Type ssh server¶

Key Types¶

SSH Config — The Secret Weapon¶

SSH Tunneling¶

Agent Forwarding¶

Part 12: The /proc Filesystem¶

Per-Process: /proc/PID/¶

System-Wide¶

Practical: Deleted Files Still Using Space¶

Part 13: Debugging with strace¶

Pattern: The Stuck Process¶

Pattern: The Slow Startup¶

Pattern: Permission Denied¶

Part 14: Performance Triage¶

The USE Method¶

Quick Triage Sequence¶

Load Average Decoded¶

Part 15: Logging¶

Log Locations¶

journalctl Essentials¶

logrotate¶

Part 16: Package Management¶

Debian/Ubuntu (apt/dpkg)¶

RHEL/Fedora/CentOS (dnf/rpm)¶

Part 17: Text Processing¶

The Pipeline Philosophy¶

What Happens When You Type `ssh server`¶