/proc Filesystem - Street-Level Ops¶

Real-world /proc diagnosis and investigation workflows for production environments.

Task: Find What a Process Is Doing Right Now¶

# The process is consuming CPU but you don't know why
$ PID=$(pgrep -f "java.*myapp")

# Check the kernel stack (what syscall is it blocked on)
$ cat /proc/$PID/stack
[<0>] futex_wait_queue+0x5a/0x70
[<0>] futex_wait+0x104/0x210
[<0>] do_futex+0x376/0x620
[<0>] __x64_sys_futex+0x12c/0x180
# Blocked on a futex -- waiting for a lock

# Check the wait channel (single-word summary of what it's blocked on)
$ cat /proc/$PID/wchan
futex_wait_queue

# Check each thread's stack (one stack file per TID in /proc/PID/task/)
$ for tid in /proc/$PID/task/*/; do
    echo "=== Thread $(basename $tid) ==="
    cat "$tid/stack" 2>/dev/null | head -3
  done

# Check if the process is in uninterruptible sleep (D state)
$ cat /proc/$PID/status | grep "^State:"
State:  D (disk sleep)
# D state = waiting on I/O, cannot be killed with SIGTERM
# Usually a sign of NFS hang, disk issues, or kernel driver problem

Task: Investigate File Descriptor Leaks¶

# Symptom: application logging "Too many open files" errors

$ PID=$(pgrep -f myapp)

# Count open FDs
$ ls /proc/$PID/fd/ | wc -l
1021

# Check the limit
$ grep "Max open files" /proc/$PID/limits
Max open files            1024           1048576        files
# Soft limit 1024 -- we're at 1021, almost out

# What types of FDs are open?
$ ls -la /proc/$PID/fd/ | awk '{print $NF}' | sed 's|.*->||' | sort | uniq -c | sort -rn | head
    456 socket:[...]
    312 /var/log/myapp/access.log (deleted)
    198 pipe:[...]
     43 /opt/myapp/data/cache/
     12 /dev/null

# 312 FDs pointing to a deleted file! The log was rotated but the
# process still holds the old file open -- the disk space isn't freed

# Check how much space the deleted file is consuming
$ ls -la /proc/$PID/fd/ | grep deleted
lrwx------ 1 appuser appuser 64 Mar 19 ... 4 -> /var/log/myapp/access.log (deleted)

# Get the size of the deleted-but-open file
$ stat /proc/$PID/fd/4 2>/dev/null
  Size: 2147483648    # 2GB of disk held by a deleted file

# Fix: Send SIGHUP to reopen log files (if the app supports it)
$ kill -HUP $PID

# Or for socket leaks, identify which remote hosts:
$ ls -la /proc/$PID/fd/ | grep socket | head -5
lrwx------ 1 appuser appuser 64 ... 15 -> socket:[123456]
$ grep 123456 /proc/net/tcp
   0: 0100007F:1F90 0100007F:C350 06 ...
# State 06 = CLOSE_WAIT -- remote closed but local hasn't

Task: Check Open Network Connections for a Process¶

$ PID=$(pgrep -f nginx)

# List all socket FDs
$ ls -la /proc/$PID/fd/ 2>/dev/null | grep socket
lrwx------ 1 root root 64 ... 6 -> socket:[78901]
lrwx------ 1 root root 64 ... 7 -> socket:[78902]

# Get socket details from /proc/net/
# TCP connections: /proc/net/tcp (IPv4) and /proc/net/tcp6 (IPv6)
$ cat /proc/net/tcp | awk '{print $2, $3, $4}' | while read local remote state; do
    # Decode hex IP:port
    lip=$(printf "%d.%d.%d.%d" 0x${local:6:2} 0x${local:4:2} 0x${local:2:2} 0x${local:0:2})
    lport=$((16#${local:9:4}))
    echo "$lip:$lport -> state:$state"
  done

# Much easier: use ss with the process filter (reads /proc internally)
$ ss -tnp | grep "pid=$PID"
ESTAB  0  0  10.0.0.5:8080  10.0.0.100:45678  users:(("nginx",pid=1234,fd=6))
ESTAB  0  0  10.0.0.5:8080  10.0.0.101:34567  users:(("nginx",pid=1234,fd=7))

Task: Live-Tune Kernel Parameters Through /proc/sys¶

Remember: Changes via echo X > /proc/sys/... take effect immediately but are lost on reboot. Always pair with a persistent entry in /etc/sysctl.d/. Convention: use numbered files like 99-tuning.conf so your settings load last and override defaults.

# Scenario: High-traffic web server running out of connection tracking entries
$ dmesg | tail
[123456.789] nf_conntrack: table full, dropping packet

# Check current limit
$ cat /proc/sys/net/netfilter/nf_conntrack_max
65536

# Check current usage
$ cat /proc/sys/net/netfilter/nf_conntrack_count
65530
# Almost full

# Increase immediately (takes effect now, lost on reboot)
$ echo 262144 > /proc/sys/net/netfilter/nf_conntrack_max

# Persist for reboot
$ echo "net.netfilter.nf_conntrack_max = 262144" >> /etc/sysctl.d/99-conntrack.conf

# Common emergency tuning:
# TCP TIME_WAIT recycling (reduce socket exhaustion)
$ echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

# Increase local port range (more outbound connections)
$ cat /proc/sys/net/ipv4/ip_local_port_range
32768   60999
$ echo "1024 65535" > /proc/sys/net/ipv4/ip_local_port_range

# Reduce swappiness under memory pressure
$ echo 10 > /proc/sys/vm/swappiness

# Increase file descriptor limit system-wide
$ cat /proc/sys/fs/file-nr
5120    0       1048576
# allocated  free  maximum
$ echo 2097152 > /proc/sys/fs/file-max

Task: Understand Memory Breakdown from /proc/meminfo¶

# Scenario: monitoring alert says 90% memory used, but you don't see it in ps

$ cat /proc/meminfo | grep -E "^(MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree|Shmem|SReclaimable|AnonPages|Dirty)"
MemTotal:       16384000 kB
MemFree:          819200 kB       # truly unused (raw free)
MemAvailable:    6553600 kB       # what apps can actually use (free + reclaimable)
Buffers:          409600 kB       # block device metadata cache
Cached:          4915200 kB       # page cache (file contents)
SwapTotal:       4096000 kB
SwapFree:        3891200 kB
Shmem:            204800 kB       # shared memory (tmpfs, IPC)
SReclaimable:     819200 kB       # slab memory the kernel can reclaim
AnonPages:       8601600 kB       # memory used by applications (no file backing)
Dirty:             12288 kB       # modified cache pages not yet written to disk

# The formula:
# MemAvailable ~= MemFree + Buffers + Cached + SReclaimable - Shmem
# (simplified; kernel uses a more nuanced calculation)

# If MemAvailable is low but AnonPages is low too, check Shmem and SReclaimable:
# Large Shmem = tmpfs or shared memory segments (check df -h /dev/shm)
# Large SReclaimable = kernel caches (usually harmless, reclaimed under pressure)

Task: Check CPU Features and Known Bugs¶

# Check if CPU supports hardware virtualization
$ grep -c "vmx\|svm" /proc/cpuinfo
32
# vmx = Intel VT-x, svm = AMD-V

# Check for CPU vulnerabilities
$ ls /sys/devices/system/cpu/vulnerabilities/
l1tf  mds  meltdown  spec_store_bypass  spectre_v1  spectre_v2  tsx_async_abort

$ for vuln in /sys/devices/system/cpu/vulnerabilities/*; do
    echo "$(basename $vuln): $(cat $vuln)"
  done
meltdown: Not affected
spectre_v1: Mitigation: usercopy/swapgs barriers
spectre_v2: Mitigation: Enhanced IBRS
mds: Not affected

# Check CPU flags for specific features
$ grep -o 'aes\|avx\|avx2\|sse4_2\|rdrand' /proc/cpuinfo | sort -u
aes
avx
avx2
rdrand
sse4_2

Task: Investigate Deleted-but-Open Files Holding Disk Space¶

One-liner: lsof +L1 -- lists all open files that have been deleted (link count < 1). This is the fastest way to find "ghost" files holding disk space. The size column shows how much space will be freed when the process closes the file or is restarted.

# Symptom: df shows disk full but du doesn't account for all the space
$ df -h /var
/dev/sda1       50G   48G  0G  100% /var

$ du -sh /var/*  2>/dev/null | sort -hr | head -5
12G     /var/log
5G      /var/lib
# Only 17G accounted for -- 31G is "missing"

# Find processes holding deleted files
$ find /proc/*/fd -lname '*(deleted)' 2>/dev/null | while read fd; do
    pid=$(echo "$fd" | cut -d/ -f3)
    size=$(stat -c%s "$fd" 2>/dev/null)
    if [ "$size" -gt 1048576 ] 2>/dev/null; then
        name=$(readlink "$fd")
        cmd=$(cat /proc/$pid/cmdline 2>/dev/null | tr '\0' ' ')
        echo "$((size/1048576))MB PID:$pid $name ($cmd)"
    fi
  done | sort -rn | head
15360MB PID:4567 /var/log/myapp/debug.log (deleted) (java -jar myapp.jar)
8192MB  PID:8901 /var/log/nginx/access.log (deleted) (nginx: worker process)
7168MB  PID:2345 /tmp/upload_cache.dat (deleted) (python3 upload_server.py)

# Fix: restart the processes or (for some apps) truncate the deleted file
# Truncate via /proc (does not require app restart):
$ : > /proc/4567/fd/5    # truncates the deleted file in-place

Task: Read /proc/PID/io for I/O-Heavy Processes¶

# Find which process is hammering the disk
# Sample I/O stats twice with a 5-second gap
$ PID=4567

$ cat /proc/$PID/io
rchar: 1234567890
wchar: 9876543210
read_bytes: 567890123
write_bytes: 8765432100

$ sleep 5 && cat /proc/$PID/io
rchar: 1234600000
wchar: 9877000000
read_bytes: 567890123
write_bytes: 8766000000

# Delta in 5 seconds:
# write_bytes increased by ~567KB -- about 113KB/s disk write
# rchar increased by ~32KB -- reads are hitting page cache (no read_bytes change)

# Script to monitor all processes' I/O:
$ for pid in /proc/[0-9]*/; do
    p=$(basename "$pid")
    io=$(cat "$pid/io" 2>/dev/null) || continue
    wb=$(echo "$io" | awk '/write_bytes/{print $2}')
    if [ "$wb" -gt 104857600 ] 2>/dev/null; then  # >100MB written
        cmd=$(cat "$pid/cmdline" 2>/dev/null | tr '\0' ' ' | cut -c1-60)
        echo "$((wb/1048576))MB written: PID $p ($cmd)"
    fi
  done | sort -rn | head -10

Task: Investigate Process Memory Details¶

$ PID=$(pgrep -f myapp)

# Quick memory overview
$ grep -E "^(VmSize|VmRSS|VmSwap|RssAnon|RssFile|RssShmem)" /proc/$PID/status
VmSize:   2345678 kB     # total virtual address space
VmRSS:     890123 kB     # physical memory in use
VmSwap:     12345 kB     # memory pushed to swap
RssAnon:   678901 kB     # anonymous (heap, stack, mmap)
RssFile:   201222 kB     # file-backed (shared libraries, mapped files)
RssShmem:   10000 kB     # shared memory

# Find the biggest memory-mapped regions
$ awk '/^[0-9a-f]/{region=$0} /^Rss:/{if($2>1024) print $2, "kB", region}' \
    /proc/$PID/smaps | sort -rn | head -5
524288 kB 7f8a10000000-7f8a30000000 rw-p 00000000 00:00 0
131072 kB 55a3b4000000-55a3bc000000 rw-p 00000000 00:00 0    [heap]
 65536 kB 7f8a08000000-7f8a0c000000 r--s 00000000 08:01 1234  /opt/myapp/data/index.db

# Check for memory leaks over time (repeat periodically)
$ while true; do
    rss=$(awk '/VmRSS/{print $2}' /proc/$PID/status)
    echo "$(date +%H:%M:%S) RSS: ${rss}kB"
    sleep 60
  done

Task: Enter a Container's Namespace via /proc¶

Under the hood: Containers are just processes with isolated namespaces. /proc/<PID>/ns/ contains symlinks to each namespace (net, mnt, pid, etc.). nsenter works by opening these namespace file descriptors and calling setns() to join them. This is why you can debug any container from the host without installing tools inside it.

# Find the container's PID on the host
$ PID=$(docker inspect --format '{{.State.Pid}}' mycontainer)

# Enter the container's network namespace to debug networking
$ nsenter -t $PID -n ss -tlnp
# Shows the container's listening sockets

# Enter the container's mount namespace to see its filesystem
$ nsenter -t $PID -m ls /app/

# Enter all namespaces (equivalent to docker exec but works with any container runtime)
$ nsenter -t $PID -m -u -i -n -p -- /bin/bash

# Compare host vs container namespace inodes
$ readlink /proc/1/ns/net       # host
net:[4026531992]
$ readlink /proc/$PID/ns/net    # container
net:[4026532456]
# Different inode = different namespace = isolated

/proc Filesystem - Street-Level Ops¶

Task: Find What a Process Is Doing Right Now¶

Task: Investigate File Descriptor Leaks¶

Task: Check Open Network Connections for a Process¶

Task: Live-Tune Kernel Parameters Through /proc/sys¶

Task: Understand Memory Breakdown from /proc/meminfo¶

Task: Check CPU Features and Known Bugs¶

Task: Investigate Deleted-but-Open Files Holding Disk Space¶

Task: Read /proc/PID/io for I/O-Heavy Processes¶

Task: Investigate Process Memory Details¶

Task: Enter a Container's Namespace via /proc¶

Pages that link here¶