Linux Memory Management — Street Ops¶
Real-world operational scenarios for memory problems. These are the situations that wake you up at 3 AM: OOM kills, memory leaks, swap storms, and NUMA imbalances.
Diagnosing OOM Kills¶
Scenario: Your application keeps getting killed and restarting¶
Step 1: Confirm it was an OOM kill
# Check dmesg for OOM messages
$ dmesg -T | grep -i "out of memory\|oom\|killed process"
[Thu Mar 19 03:22:15 2026] Out of memory: Killed process 12345 (java) total-vm:8388608kB, anon-rss:4194304kB, file-rss:32768kB, shmem-rss:0kB, UID:1000 pgtables:8192kB oom_score_adj:0
# Check journal for OOM events
$ journalctl -k --grep="oom|killed process" --since "24 hours ago"
# Get the full OOM dump (shows what triggered it and memory state)
$ dmesg -T | grep -A 30 "invoked oom-killer"
Step 2: Understand the OOM dump
# Key lines from the OOM dump:
Mar 19 03:22:15 server01 kernel: java invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Mar 19 03:22:15 server01 kernel: Mem-Info:
Mar 19 03:22:15 server01 kernel: active_anon:1048576 inactive_anon:524288 ...
Mar 19 03:22:15 server01 kernel: Node 0 Normal free:2048kB min:16384kB low:20480kB high:24576kB
Mar 19 03:22:15 server01 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,...
Mar 19 03:22:15 server01 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Mar 19 03:22:15 server01 kernel: [12345] 1000 12345 8388608 4194304 8388608 0 0 java
Mar 19 03:22:15 server01 kernel: [ 567] 0 567 45678 12345 98304 0 -1000 sshd
Mar 19 03:22:15 server01 kernel: Out of memory: Killed process 12345 (java) total-vm:8388608kB, anon-rss:4194304kB
Key things to extract: - Which process was killed and its memory usage (RSS) - What triggered it (the process that made the allocation request) - System memory state at the time (free, active, inactive) - oom_score_adj — was the process protected?
Step 3: Prevent recurrence
# Option A: Give the system more memory/swap
$ sudo fallocate -l 4G /swapfile2
$ sudo chmod 600 /swapfile2
$ sudo mkswap /swapfile2
$ sudo swapon /swapfile2
# Option B: Limit the application's memory (cgroups/systemd)
$ sudo systemctl edit myapp.service
# Add:
[Service]
MemoryMax=4G
MemoryHigh=3G
# Option C: Protect critical services from OOM
$ sudo systemctl edit sshd.service
[Service]
OOMScoreAdjust=-900
# Option D: Fix the application's memory leak (see below)
Tuning Swappiness¶
Scenario: Database server has high latency during memory pressure¶
# Check current swappiness
$ cat /proc/sys/vm/swappiness
60 # Default — too aggressive for database servers
# Check if the system is actively swapping
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 1 524288 65536 32768 2097152 256 512 1024 2048 1500 3000 30 5 50 15 0
# ^^^ ^^^
# si/so > 0 means active swapping
# Check swap usage per process
$ for pid in /proc/[0-9]*; do
p=$(basename $pid)
swap=$(awk '/Swap:/ {sum+=$2} END {print sum}' "$pid/smaps" 2>/dev/null)
[ "$swap" -gt 0 ] 2>/dev/null && echo "$swap kB PID=$p $(cat $pid/comm 2>/dev/null)"
done | sort -rn | head -10
# Tune swappiness
$ sudo sysctl vm.swappiness=10
# Make permanent
$ echo "vm.swappiness=10" | sudo tee /etc/sysctl.d/60-swappiness.conf
$ sudo sysctl --system
# For databases, also consider:
$ sudo sysctl vm.vfs_cache_pressure=50 # Less aggressive inode/dentry reclaim
Monitoring Memory Pressure (PSI)¶
Scenario: Need an early warning system for memory problems¶
# Check current memory pressure
$ cat /proc/pressure/memory
some avg10=2.50 avg60=1.20 avg300=0.80 total=98765432
full avg10=0.10 avg60=0.05 avg300=0.02 total=1234567
# Interpretation:
# some avg10=2.50 → 2.5% of the last 10s, at least one task was stalled on memory
# full avg10=0.10 → 0.1% of the last 10s, ALL tasks were stalled
# Warning thresholds: some > 10%, full > 1% indicates significant pressure
# Per-cgroup pressure (containers)
$ cat /sys/fs/cgroup/system.slice/docker-abc123.scope/memory.pressure
some avg10=15.00 avg60=8.50 avg300=3.20 total=456789012
# This specific container is under memory pressure
# Monitor in real-time
$ watch -n 1 'echo "=== System ===" && cat /proc/pressure/memory && echo && echo "=== CPU ===" && cat /proc/pressure/cpu && echo && echo "=== IO ===" && cat /proc/pressure/io'
# Set up PSI-based monitoring trigger (kernel 5.2+)
# This creates a file descriptor that triggers when pressure exceeds threshold
# Useful for custom monitoring scripts
Finding Memory Leaks¶
Scenario: Application RSS keeps growing, suspected memory leak¶
# Track RSS growth over time
$ while true; do
rss=$(ps -o rss= -p 12345)
echo "$(date +%H:%M:%S) RSS: ${rss} kB"
sleep 60
done | tee /tmp/rss-track.log
# Get detailed memory map
$ pmap -x 12345 | sort -k2 -rn | head -20
# Look for anonymous mappings that are growing
# Detailed view with smaps
$ cat /proc/12345/smaps | awk '
/^[0-9a-f]/ { region=$0 }
/^Rss:/ { rss=$2; if(rss > 10240) print rss " kB " region }
' | sort -rn | head -20
# Compare smaps over time to find growing regions
$ cat /proc/12345/smaps_rollup
Rss: 2097152 kB
Pss: 2000000 kB
Shared_Clean: 65536 kB
Shared_Dirty: 4096 kB
Private_Clean: 16384 kB
Private_Dirty: 2011136 kB # <-- High private dirty = heap growth
Swap: 0 kB
# For Java apps, use jmap/jcmd
$ jcmd 12345 GC.heap_info
$ jmap -histo 12345 | head -20
# For native leaks, use valgrind (development) or heaptrack
$ heaptrack -p 12345
# Let it run for a while, then analyze
$ heaptrack_gui heaptrack.12345.*.zst
# Quick check: is it actually leaked or just cached?
$ cat /proc/12345/status | grep -E "VmRSS|VmSwap|VmData|VmStk"
VmRSS: 2097152 kB
VmSwap: 0 kB
VmData: 4194304 kB # <-- Virtual data segment (heap)
VmStk: 8192 kB
# If VmData >> VmRSS, the process allocated a lot but isn't using it all
cgroup Memory Debugging in Containers¶
Scenario: Container keeps getting OOMKilled despite having "enough" memory¶
# Check container's memory limit and current usage
$ docker stats --no-stream mycontainer
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM %
abc123 mycontainer 15.2% 480MiB / 512MiB 93.75%
# Detailed cgroup stats
$ CGROUP_PATH=$(docker inspect mycontainer --format '{{.State.Pid}}' | xargs -I{} cat /proc/{}/cgroup | grep memory | cut -d: -f3)
# For cgroup v2:
$ docker inspect mycontainer --format '{{.State.Pid}}'
12345
$ cat /proc/12345/cgroup
0::/system.slice/docker-abc123.scope
$ cat /sys/fs/cgroup/system.slice/docker-abc123.scope/memory.current
503316480 # ~480 MB
$ cat /sys/fs/cgroup/system.slice/docker-abc123.scope/memory.max
536870912 # 512 MB limit
# Check what's using the memory INSIDE the cgroup
$ cat /sys/fs/cgroup/system.slice/docker-abc123.scope/memory.stat
anon 450000000 # Anonymous (heap) memory
file 50000000 # Page cache
kernel 3000000 # Kernel memory (slab, page tables, etc.)
slab 2000000 # Slab allocations
sock 500000 # Socket buffers
# ... more fields
# The kernel memory counts toward the limit too!
# anon + file + kernel = total → if close to limit, OOM risk
# Check for OOM events in this cgroup
$ cat /sys/fs/cgroup/system.slice/docker-abc123.scope/memory.events
low 0
high 15 # Hit memory.high 15 times (throttled)
max 3 # Hit memory.max 3 times (OOM events)
oom 2 # OOM killer invoked 2 times
oom_kill 2 # Processes killed 2 times
oom_group_kill 0
# Fix: increase the limit or reduce usage
$ docker update --memory=1g --memory-swap=1g mycontainer
Common gotcha: kernel memory eating into the container limit¶
# Containers with many small files or network connections
# can have high kernel (slab) memory usage
$ cat /sys/fs/cgroup/system.slice/docker-abc123.scope/memory.stat | grep kernel
kernel 52428800 # 50 MB of kernel memory!
# This counts toward the memory.max limit
# If your app uses 460 MB and kernel uses 50 MB, you need > 510 MB limit
NUMA Imbalance Troubleshooting¶
Scenario: Database performance degrades on a multi-socket server¶
# Check NUMA topology
$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65536 MB
node 0 free: 8192 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB
node 1 free: 45056 MB # <-- Huge imbalance! Node 1 is barely used
# Check NUMA stats
$ numastat
node0 node1
numa_hit 89012345 12345678
numa_miss 0 5678901 # <-- Remote allocations!
numa_foreign 5678901 0
# Check per-process NUMA allocation
$ numastat -p $(pgrep postgres)
Per-node process memory usage (in MBs) for PID 12345 (postgres)
Node 0 Node 1 Total
--------------- --------------- ---------------
Huge 0.00 0.00 0.00
Heap 2048.00 128.00 2176.00
Stack 0.12 0.00 0.12
Private 50000.00 5000.00 55000.00
# Almost all memory on Node 0 — if postgres CPUs span both nodes, remote access is slow
# Fix: bind the database to one NUMA node
$ sudo numactl --cpunodebind=0 --membind=0 /usr/bin/postgres ...
# Or in systemd:
$ sudo systemctl edit postgresql.service
[Service]
ExecStart=
ExecStart=/usr/bin/numactl --cpunodebind=0 --membind=0 /usr/bin/postgres -D /var/lib/postgresql/data
# For a more balanced approach with interleaving:
$ sudo numactl --interleave=all /usr/bin/postgres ...
Clearing Page Cache Safely¶
Gotcha: High
buff/cacheinfree -his not a memory problem. Linux deliberately uses free RAM for page cache because unused RAM is wasted RAM. Theavailablecolumn is what matters -- it shows how much memory can be reclaimed for applications. Operators who routinely drop caches "to free memory" are actively hurting performance.
Scenario: Need to benchmark disk I/O without page cache interference¶
# ALWAYS sync first to flush dirty pages to disk
$ sync
# Drop page cache only (safest)
$ echo 1 | sudo tee /proc/sys/vm/drop_caches
# Drop dentries and inodes too
$ echo 2 | sudo tee /proc/sys/vm/drop_caches
# Drop everything
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
# Verify
$ free -h
# buff/cache should be much lower now
When it's safe: - Benchmarking (need cold cache) - Investigating memory usage without cache noise - Emergency: system is thrashing and you need to free memory fast
When it's NOT safe: - Routinely in production (cache will rebuild, causing I/O storm) - As a "fix" for high memory usage (cache IS supposed to use memory)
Emergency: System is Thrashing (Swap Storm)¶
Scenario: System is extremely slow, load average is 50+, everything swapping¶
# Confirm it's a swap storm
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 45 4194304 16384 1024 32768 8192 4096 8192 4096 5000 2000 5 10 0 85 0
# ^^ ^^^^ ^^^^ ^^
# blocked processes heavy swapping I/O wait
# sar can show historical swap activity
$ sar -W 1 5
12:00:01 AM pswpin/s pswpout/s
12:00:02 AM 8192.00 4096.00 # <-- Thousands of pages/sec
# Find which processes are consuming the most memory
$ ps aux --sort=-%mem | head -10
# Find which processes have the most swap
$ for pid in /proc/[0-9]*; do
p=$(basename $pid)
swap=$(awk '/VmSwap:/ {print $2}' "$pid/status" 2>/dev/null)
[ "${swap:-0}" -gt 0 ] && echo "$swap kB $p $(cat $pid/comm 2>/dev/null)"
done | sort -rn | head -10
# Emergency actions (in order of aggression):
# 1. Kill the obvious memory hog
$ kill 12345 # Graceful
$ kill -9 12345 # If graceful doesn't work
# 2. Drop page cache to free some breathing room
$ sync && echo 3 > /proc/sys/vm/drop_caches
# 3. Temporarily reduce swappiness to stop swap-in/swap-out churn
$ sysctl vm.swappiness=0
# 4. If you can identify the process, use cgroup to limit it
$ systemctl set-property myapp.service MemoryMax=2G
# 5. Last resort: add emergency swap
$ sudo fallocate -l 4G /emergency-swap
$ sudo chmod 600 /emergency-swap
$ sudo mkswap /emergency-swap
$ sudo swapon -p -1 /emergency-swap # Low priority
Transparent Huge Pages Causing Database Latency¶
Scenario: Redis/MongoDB has periodic latency spikes (every few seconds to minutes)¶
# Check THP status
$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
# Check compaction activity (THP needs contiguous memory)
$ grep -i compact /proc/vmstat
compact_stall 12345 # <-- Process was stalled waiting for compaction
compact_success 8000
compact_fail 4345
compact_pages_moved 5678901
# If compact_stall is high, THP compaction is causing latency
# Check specific to Redis
$ redis-cli info memory | grep transparent_hugepage
# Redis will warn if THP is enabled
# Disable THP
$ echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
$ echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
# Make persistent across reboots
$ cat <<'EOF' | sudo tee /etc/systemd/system/disable-thp.service
[Unit]
Description=Disable Transparent Huge Pages (THP)
DefaultDependencies=no
After=sysinit.target local-fs.target
Before=basic.target
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag'
[Install]
WantedBy=basic.target
EOF
$ sudo systemctl daemon-reload
$ sudo systemctl enable disable-thp.service
# Verify latency improvement
$ redis-cli --latency -h localhost
# Before: avg 0.5ms, max 150ms
# After: avg 0.3ms, max 2ms
Reading /proc/meminfo Under Pressure¶
Scenario: Monitoring alert says "low memory" — is it real?¶
# Quick assessment
$ free -h
total used free shared buff/cache available
Mem: 62Gi 58Gi 256Mi 128Mi 4Gi 3.5Gi
Swap: 7.8Gi 2.1Gi 5.7Gi
# available = 3.5Gi → system has 3.5 GB before it's truly in trouble
# Swap used = 2.1 GB → some swapping, but could be idle pages
# Is the swap usage active or stale?
$ vmstat 1 5
# If si/so are 0, swap contains stale pages — not a problem
# If si/so are nonzero, active swapping — investigate
# Detailed breakdown
$ awk '
/MemTotal:/ {total=$2}
/MemAvailable:/ {avail=$2}
/Buffers:/ {buf=$2}
/^Cached:/ {cache=$2}
/SwapTotal:/ {stotal=$2}
/SwapFree:/ {sfree=$2}
/Slab:/ {slab=$2}
/SReclaimable:/ {srec=$2}
END {
printf "Total: %8d MB\n", total/1024
printf "Available: %8d MB (%.1f%%)\n", avail/1024, avail*100/total
printf "Buffers: %8d MB\n", buf/1024
printf "Cache: %8d MB\n", cache/1024
printf "Slab: %8d MB (reclaimable: %d MB)\n", slab/1024, srec/1024
printf "Swap used: %8d MB / %d MB\n", (stotal-sfree)/1024, stotal/1024
}
' /proc/meminfo
# If available is > 10% of total, the system is fine
# If available is 1-5%, start investigating
# If available is < 1%, take action now