/proc Filesystem Footguns¶

Mistakes that cause security exposure, misdiagnosis, lost tuning, or silent breakage.

1. /proc/PID/environ is readable by the same user (information leak)¶

Any process running as the same UID can read another process's environment variables. If you store database passwords, API keys, or tokens in environment variables, any compromised process running as the same user can harvest them.

# As appuser, read the environment of any other appuser process
$ cat /proc/$(pgrep -f "java.*myapp")/environ | tr '\0' '\n' | grep PASSWORD
DATABASE_PASSWORD=hunter2

On older kernels (pre-2.6.39), /proc/PID/environ was world-readable. Modern kernels restrict it to the process owner and root, but same-user access is still a risk in shared-UID deployments.

Fix: Use secrets injection that writes to files (tmpfs/ramfs) with strict permissions (0400) rather than environment variables. In Kubernetes, mount secrets as files instead of env vars. If env vars are necessary, run each service with a distinct UID.

2. Writing to /proc/sys without persisting (lost on reboot)¶

You tune net.core.somaxconn or vm.swappiness via /proc/sys/ during an incident. The system reboots next week and reverts to defaults. The problem returns and nobody remembers the fix.

# This works now but is gone after reboot
echo 65535 > /proc/sys/net/core/somaxconn

# This persists
echo "net.core.somaxconn = 65535" >> /etc/sysctl.d/99-tuning.conf
sysctl --system

Fix: Every write to /proc/sys/ must have a corresponding entry in /etc/sysctl.d/ or /etc/sysctl.conf. Treat any ad-hoc /proc/sys/ write as a temporary measure and immediately document the persistent version.

3. /proc inside containers showing host information¶

Some /proc entries inside a container still reflect the host, not the container. This is a fundamental property of how /proc is mounted in containers.

# Inside a container with 512MB memory limit:
$ cat /proc/meminfo | grep MemTotal
MemTotal:       65536000 kB
# Shows the HOST's 64GB, not the container's 512MB limit

# The container's actual memory limit is in cgroup files:
$ cat /sys/fs/cgroup/memory.max    # cgroup v2
536870912                           # 512MB

# Similarly, /proc/cpuinfo shows ALL host CPUs, not the container's CPU limit
$ grep -c "^processor" /proc/cpuinfo
64
# Even if the container is limited to 2 CPUs

Applications that size thread pools or caches based on /proc/meminfo or /proc/cpuinfo will over-allocate. JVMs before Java 10, Go runtimes, and many monitoring agents get this wrong.

Fix: Applications should check cgroup limits, not /proc/meminfo. For the JVM, use -XX:+UseContainerSupport (default since Java 10). For Go, use github.com/KimMachineGun/automemlimit. For monitoring, read /sys/fs/cgroup/ instead of /proc/.

War story: Before Java 10 added container awareness, JVMs in containers routinely set heap sizes based on host memory. A container with a 512MB limit on a 64GB host would set -Xmx to ~16GB (default 1/4 of "available" memory), immediately get OOMKilled by the kernel, and CrashLoopBackOff. This was so widespread that it drove the JEP 8146115 fix and the -XX:+UseContainerSupport flag.

4. Parsing /proc/meminfo wrong (MemAvailable vs MemFree)¶

MemFree is the raw unused memory. MemAvailable is the kernel's estimate of how much memory is available for new applications (including reclaimable caches and buffers). Using MemFree for capacity planning causes false alarms because Linux intentionally uses free RAM for caching.

$ grep -E "^(MemTotal|MemFree|MemAvailable)" /proc/meminfo
MemTotal:       16384000 kB
MemFree:          409600 kB     # "Only 400MB free! We're out of memory!"
MemAvailable:    8192000 kB     # Actually 8GB available for apps

Fix: Always use MemAvailable (present since Linux 3.14). If your monitoring alert fires on MemFree percentage, it is misconfigured. The formula free uses for "available" changed in 2014 for this exact reason.

5. Relying on /proc/PID/cmdline (can be overwritten)¶

A process can modify its own argv[0] at runtime, changing what appears in /proc/PID/cmdline. This means cmdline is not a reliable indicator of what binary is actually running.

# A process can disguise itself
# In C: strcpy(argv[0], "[kworker/0:0]");
# In Python: import ctypes; ... (prctl PR_SET_NAME)

# You see what looks like a kernel thread:
$ cat /proc/12345/cmdline | tr '\0' ' '
[kworker/0:0]

# But the actual binary is:
$ readlink /proc/12345/exe
/tmp/.hidden/cryptominer

Fix: Use /proc/PID/exe (symlink to the actual binary on disk) for identity verification, not cmdline. For forensics, also check /proc/PID/maps to see which shared libraries are loaded.

6. Race conditions reading /proc (process can exit)¶

A process can exit between the time you list /proc/ and the time you read its files. This causes "No such file or directory" errors in scripts that iterate over all processes.

# This script will occasionally fail:
for pid in /proc/[0-9]*/; do
    cat "$pid/status"    # FAILS if process exited
done

# Worse: the PID could be recycled and you read a DIFFERENT process's data
# Between listing and reading, PID 12345 (your target) exits,
# and PID 12345 is reassigned to a new unrelated process

Fix: Always handle errors when reading /proc/PID/ files. Use 2>/dev/null or check return codes. For critical operations, verify the process identity (e.g., check cmdline or exe) after reading to confirm you got the right process.

7. /proc/net/tcp hex format confusion¶

/proc/net/tcp stores IP addresses and ports in hexadecimal, with the IP in little-endian byte order on little-endian systems. Parsing it wrong gives you backwards IP addresses.

$ cat /proc/net/tcp | head -2
  sl  local_address rem_address   st ...
   0: 0100007F:0CEA 00000000:0000 0A ...

# 0100007F = 127.0.0.1 (read bytes right-to-left: 01.00.00.7F -> 1.0.0.127? No!)
# It's little-endian: 7F.00.00.01 -> 127.0.0.1
# Port 0CEA = 3306 (decimal) -- this is MySQL listening on localhost

# State codes:
# 01 = ESTABLISHED, 02 = SYN_SENT, 06 = TIME_WAIT, 0A = LISTEN

Fix: Use ss or netstat instead of parsing /proc/net/tcp directly. If you must parse it (embedded systems, minimal containers), use a tested parsing function that handles byte order correctly. IPv6 addresses in /proc/net/tcp6 are even more confusing (128 bits in hex).

8. Treating /proc/PID/io read counts as disk I/O¶

rchar and wchar in /proc/PID/io count bytes passed through read() and write() syscalls, including page cache hits. They are not disk I/O. read_bytes and write_bytes are the actual storage I/O, but even these can be misleading because they count I/O submitted to the block layer, not necessarily completed.

$ cat /proc/1234/io
rchar: 50000000000    # 50GB "read" (mostly from cache)
read_bytes: 100000000 # 100MB actually read from disk

Fix: Use read_bytes and write_bytes for disk I/O analysis. Use rchar/wchar for understanding syscall volume. For accurate disk I/O attribution, use iotop or BPF-based tools (biosnoop from bcc-tools) which trace block layer events directly.

Debug clue: If rchar is 100x read_bytes, the process is reading heavily from page cache (fast). If read_bytes is close to rchar, the process is generating real disk I/O (slow). This ratio is a quick indicator of whether your working set fits in memory.