Portal | Level: L2: Operations | Topics: Linux Fundamentals | Domain: Linux

Linux Memory Management¶

Scope¶

This document explains Linux memory management from the point of view of an admin / DevOps / performance engineer. It covers:

virtual memory and address spaces
page tables
anonymous memory vs page cache
reclaim
slab/slub caches
NUMA
swap
OOM killer
huge pages
common failure modes

This is not kernel-hacker source commentary. It is the practical deep dive you need to stop treating memory as a single mystery number.

Big picture¶

Linux memory is not just "RAM used by processes." Memory is divided among several competing uses:

process private memory
shared memory
page cache
kernel memory
slab caches
filesystem metadata caches
network buffers
vmalloc / direct map structures
pinned memory
huge pages
swap-backed pages and swap cache

Core mental model¶

physical RAM
  -> managed in pages
  -> mapped into virtual address spaces
  -> used for anonymous memory and file-backed cache
  -> reclaimed under pressure
  -> spilled to swap in some cases
  -> protected / partitioned by policies, zones, NUMA, and cgroups

There is no single "free RAM good, used RAM bad" rule. Linux intentionally uses spare RAM for caching. The real question is whether reclaim is healthy and whether pressure causes latency or kills.

Virtual memory basics¶

Each process sees a virtual address space, not direct raw physical memory.

Why virtual memory exists¶

It provides:

isolation between processes
a stable address abstraction
lazy allocation
memory-mapped files
page sharing
copy-on-write behavior after fork()
kernel/user separation

Address translation¶

A memory access goes roughly:

virtual address
  -> page table walk (or TLB hit)
  -> physical page frame
  -> actual memory access

CPU MMUs and TLBs make this efficient.

Important consequences¶

processes can each think they use the same addresses
a process can reserve more virtual space than has physical backing
a mapped file page and an anonymous heap page are not the same thing operationally

Pages and page size¶

Memory is managed in units called pages.

Common base page size on x86_64 is 4 KiB, though larger pages exist.

Why pages matter¶

Reclaim, allocation, mapping, faults, swapping, and page cache are all page-oriented.

Larger pages¶

Transparent Huge Pages (THP)
explicit huge pages / hugetlb

Larger pages can reduce TLB pressure but may increase fragmentation or operational complexity.

Anonymous memory vs file-backed memory¶

This distinction is fundamental.

Anonymous memory¶

Not backed by a filesystem file.

Examples:

heap allocations
stack
most runtime object allocations
anonymous shared mappings

This memory may be reclaimed by swap if swapping is enabled and policy allows.

File-backed memory¶

Backed by files.

Examples:

executable text segments
shared libraries
memory-mapped files
page cache for file IO

Clean file-backed pages can often be dropped and reloaded from disk, making them cheaper to reclaim than dirty anonymous memory.

Why admins get confused¶

They see "used memory" and panic, but much of it may be reclaimable page cache rather than unrecoverable process private state.

Page cache¶

Page cache stores file data in memory to avoid repeated disk IO.

This is one of the biggest reasons Linux appears to "use all RAM."

Why page cache is good¶

speeds reads
can buffer writes before flush
improves filesystem performance enormously
is automatically managed by the kernel

Dirty vs clean pages¶

clean page cache can usually be dropped
dirty page cache must be written back before reclaim

Important reality¶

If memory pressure grows, page cache is often reclaimed first because it is cheaper than murdering active anonymous memory.

Slab / slub allocator and kernel memory¶

Kernel memory is not just one blob. The kernel maintains caches for common object types:

dentries
inodes
task structures
networking objects
various metadata

This is often shown as slab memory.

Why it matters¶

A system can look memory-stressed not because app heap exploded, but because kernel object caches or pinned kernel allocations grew.

Tools¶

/proc/slabinfo
slabtop
/proc/meminfo

Allocation and page faults¶

Memory is often allocated lazily.

Example:

process calls malloc()
allocator reserves virtual address range
physical pages may not be committed until first touch
on access, a page fault occurs and the kernel maps/fills a page

Minor vs major faults¶

minor fault: no disk IO needed, mapping established
major fault: disk IO or heavier recovery needed

High fault rates can be normal or pathological depending on type and workload phase.

Zones and watermarks¶

Linux divides physical memory into zones for hardware/architecture constraints.

Typical x86 concepts include:

DMA
DMA32
Normal
Movable

The kernel maintains watermarks and reserve logic so allocations do not completely wreck system forward progress.

Why you care¶

You can have "memory available" overall but still fail specific allocations due to fragmentation or zone constraints.

Reclaim¶

When free memory falls, the kernel tries to reclaim pages.

Broadly, reclaim prefers pages that are cheaper to recover:

cold page cache
reclaimable slab
inactive anonymous pages, possibly to swap

Active and inactive lists¶

Linux tracks page activity heuristically to decide what is likely cold enough to reclaim.

Direct reclaim vs background reclaim¶

background reclaim is performed by kernel threads such as kswapd
direct reclaim happens when an allocating thread gets dragged into reclaim work itself

Direct reclaim is bad news for latency. It means your app is now helping clean the mess instead of doing its job.

Writeback¶

Dirty file-backed pages must eventually be written to storage.

This matters because reclaim and writeback interact:

heavy dirtying can pressure memory
slow storage can stall reclaim
writeback throttling can hurt app latency

Common symptoms:

memory pressure with lots of dirty/writeback pages
storage bottleneck masquerading as a memory problem

Swap¶

Swap allows anonymous memory pages to be evicted from RAM to swap space.

What swap is not¶

Swap is not "always bad." It is a tradeoff tool.

Benefits¶

gives reclaim another option besides OOM
can smooth transient spikes
lets cold anonymous pages leave RAM
can keep page cache available for IO-heavy workloads

Costs¶

if the system actively churns swapped pages, latency can become catastrophic
swap-on-slow-disk is painful
bad swappiness tuning can hide memory pressure until it becomes a swamp monster

Key subtlety¶

File-backed pages usually do not need swap; they can often just be dropped. Anonymous memory is the main swap candidate.

OOM killer¶

If reclaim cannot free enough memory, Linux may invoke the OOM killer.

The OOM killer selects a victim based on several factors, including badness heuristics and adjustments.

Important points¶

the process using the most RSS is not automatically the victim
cgroup-scoped OOM can kill within a memory-limited container even if the host has RAM
"Killed" in app logs is often your only clue if you were not watching kernel messages

Common causes¶

memory leak
bad limits
container memory ceiling too low
runaway tmpfs / page cache patterns in constrained environments
too many concurrent workers

NUMA¶

On NUMA systems, memory is not equally close to every CPU. Access to local node memory is faster than remote memory.

Why NUMA matters¶

poor placement increases latency and reduces throughput
workloads can become bottlenecked despite plenty of total RAM
CPU pinning without memory locality awareness can hurt badly

Policies and tools¶

numactl
NUMA balancing
memory policy controls
cgroup cpuset interactions

Transparent Huge Pages¶

THP lets the kernel use larger pages automatically where beneficial.

Benefits¶

fewer page table entries
lower TLB pressure
better performance for some memory-intensive workloads

Risks¶

latency spikes from compaction
weird interactions with databases or memory-sensitive systems
fragmentation side effects

This is why some environments disable THP for certain workloads.

Memory in containers and cgroups¶

Container memory behavior is one of the most misunderstood production topics.

Key facts¶

container memory limits are enforced via cgroups
reclaim happens within that policy context
page cache counts too, not just heap
a container can hit OOM while the host still looks comfortable overall
limits without requests/planning create chaos

Common surprise¶

A service does moderate file IO inside a container, page cache grows, limit is hit, and everyone screams "memory leak" even though the process heap is not the main issue.

Reading `/proc/meminfo` sanely¶

Important fields:

MemTotal
MemFree
MemAvailable
Buffers
Cached
Slab
SReclaimable
SUnreclaim
Dirty
Writeback
AnonPages
Mapped
SwapTotal
SwapFree

Use `MemAvailable`, not just `MemFree`¶

MemFree alone is almost useless for judging whether the system is healthy.

Linux tries to use RAM. That is not failure. Reclaim stalls, swap thrash, and OOM are failure.

Common production failure patterns¶

1. High memory usage but system fine¶

Likely healthy cache usage. Check:

MemAvailable
swap activity
reclaim pressure
IO latency

2. High load and terrible latency with no obvious CPU bottleneck¶

Could be:

direct reclaim
writeback congestion
swap thrash
NUMA remote access
compaction / THP trouble

3. Container OOM on a healthy-looking node¶

Likely:

cgroup limit too low
page cache inside cgroup
wrong request/limit sizing
burst memory pattern not budgeted

4. "Free memory vanished"¶

Maybe page cache, slab growth, or tmpfs growth. That is a diagnostic branch, not a conclusion.

5. IO issue that looks like memory issue¶

Dirty pages pile up, reclaim stalls, app hangs. Storage and memory are in a toxic codependent relationship.

Practical debugging workflow¶

Step 1 - determine pressure, not just usage¶

Check:

free -h
/proc/meminfo
vmstat 1
sar -B
PSI memory pressure if available

Step 2 - separate categories¶

Ask:

anonymous memory high?
page cache high?
slab high?
dirty/writeback high?
swap active?
cgroup-limited?

Step 3 - look for symptoms of pain¶

major faults
swap in/out
direct reclaim
OOM messages
elevated IO wait
application latency spikes

Step 4 - account for containers and cgroups¶

Host totals can hide cgroup-local disasters.

Interview angles¶

Useful questions hidden here:

difference between page cache and process memory
what MemAvailable means
why Linux uses spare RAM
how swap helps and hurts
what causes OOM
what THP is
why a container can OOM even when host RAM exists
what NUMA is and why locality matters
what direct reclaim is and why latency suffers

Strong answers separate usage, reclaimability, and pressure.

Mental model to keep¶

Memory on Linux is an economy, not a bucket.

Pages are constantly being:

allocated
mapped
dirtied
cached
reclaimed
swapped
compacted
pinned
killed over

The right question is not "how much is used?"

It is:

which pages are being used for what,
how expensive are they to reclaim,
and is the system under memory pressure severe enough to hurt work?

References¶

Prerequisites¶

Linux Ops (Topic Pack, L0)

/proc Filesystem (Topic Pack, L2) — Linux Fundamentals
Advanced Bash for Ops (Topic Pack, L1) — Linux Fundamentals
Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Linux Fundamentals
Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Linux Fundamentals
Case Study: CI Pipeline Fails — Docker Layer Cache Corruption (Case Study, L2) — Linux Fundamentals
Case Study: Container Vuln Scanner False Positive Blocks Deploy (Case Study, L2) — Linux Fundamentals
Case Study: Disk Full Root Services Down (Case Study, L1) — Linux Fundamentals
Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention (Case Study, L2) — Linux Fundamentals
Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP (Case Study, L2) — Linux Fundamentals
Case Study: Inode Exhaustion (Case Study, L1) — Linux Fundamentals

Linux Memory Management¶

Scope¶

Big picture¶

Core mental model¶

Virtual memory basics¶

Why virtual memory exists¶

Address translation¶

Important consequences¶

Pages and page size¶

Why pages matter¶

Larger pages¶

Anonymous memory vs file-backed memory¶

Anonymous memory¶

File-backed memory¶

Why admins get confused¶

Page cache¶

Why page cache is good¶

Dirty vs clean pages¶

Important reality¶

Slab / slub allocator and kernel memory¶

Why it matters¶

Tools¶

Allocation and page faults¶

Minor vs major faults¶

Zones and watermarks¶

Why you care¶

Reclaim¶

Active and inactive lists¶

Direct reclaim vs background reclaim¶

Writeback¶

Swap¶

What swap is not¶

Benefits¶

Costs¶

Key subtlety¶

OOM killer¶

Important points¶

Common causes¶

NUMA¶

Why NUMA matters¶

Policies and tools¶

Transparent Huge Pages¶

Benefits¶

Risks¶

Memory in containers and cgroups¶

Key facts¶

Common surprise¶

Reading /proc/meminfo sanely¶

Use MemAvailable, not just MemFree¶

Common production failure patterns¶

1. High memory usage but system fine¶

2. High load and terrible latency with no obvious CPU bottleneck¶

3. Container OOM on a healthy-looking node¶

4. "Free memory vanished"¶

5. IO issue that looks like memory issue¶

Practical debugging workflow¶

Step 1 - determine pressure, not just usage¶

Step 2 - separate categories¶

Step 3 - look for symptoms of pain¶

Step 4 - account for containers and cgroups¶

Interview angles¶

Mental model to keep¶

References¶

Wiki Navigation¶

Prerequisites¶

Related Content¶

Pages that link here¶

Reading `/proc/meminfo` sanely¶

Use `MemAvailable`, not just `MemFree`¶