Skip to content

Portal | Level: L2: Operations | Topics: Linux Fundamentals | Domain: Linux

Linux Memory Management

Scope

This document explains Linux memory management from the point of view of an admin / DevOps / performance engineer. It covers:

  • virtual memory and address spaces
  • page tables
  • anonymous memory vs page cache
  • reclaim
  • slab/slub caches
  • NUMA
  • swap
  • OOM killer
  • huge pages
  • common failure modes

This is not kernel-hacker source commentary. It is the practical deep dive you need to stop treating memory as a single mystery number.


Big picture

Linux memory is not just "RAM used by processes." Memory is divided among several competing uses:

  • process private memory
  • shared memory
  • page cache
  • kernel memory
  • slab caches
  • filesystem metadata caches
  • network buffers
  • vmalloc / direct map structures
  • pinned memory
  • huge pages
  • swap-backed pages and swap cache

Core mental model

physical RAM
  -> managed in pages
  -> mapped into virtual address spaces
  -> used for anonymous memory and file-backed cache
  -> reclaimed under pressure
  -> spilled to swap in some cases
  -> protected / partitioned by policies, zones, NUMA, and cgroups

There is no single "free RAM good, used RAM bad" rule. Linux intentionally uses spare RAM for caching. The real question is whether reclaim is healthy and whether pressure causes latency or kills.


Virtual memory basics

Each process sees a virtual address space, not direct raw physical memory.

Why virtual memory exists

It provides:

  • isolation between processes
  • a stable address abstraction
  • lazy allocation
  • memory-mapped files
  • page sharing
  • copy-on-write behavior after fork()
  • kernel/user separation

Address translation

A memory access goes roughly:

virtual address
  -> page table walk (or TLB hit)
  -> physical page frame
  -> actual memory access

CPU MMUs and TLBs make this efficient.

Important consequences

  • processes can each think they use the same addresses
  • a process can reserve more virtual space than has physical backing
  • a mapped file page and an anonymous heap page are not the same thing operationally

Pages and page size

Memory is managed in units called pages.

Common base page size on x86_64 is 4 KiB, though larger pages exist.

Why pages matter

Reclaim, allocation, mapping, faults, swapping, and page cache are all page-oriented.

Larger pages

  • Transparent Huge Pages (THP)
  • explicit huge pages / hugetlb

Larger pages can reduce TLB pressure but may increase fragmentation or operational complexity.


Anonymous memory vs file-backed memory

This distinction is fundamental.

Anonymous memory

Not backed by a filesystem file.

Examples:

  • heap allocations
  • stack
  • most runtime object allocations
  • anonymous shared mappings

This memory may be reclaimed by swap if swapping is enabled and policy allows.

File-backed memory

Backed by files.

Examples:

  • executable text segments
  • shared libraries
  • memory-mapped files
  • page cache for file IO

Clean file-backed pages can often be dropped and reloaded from disk, making them cheaper to reclaim than dirty anonymous memory.

Why admins get confused

They see "used memory" and panic, but much of it may be reclaimable page cache rather than unrecoverable process private state.


Page cache

Page cache stores file data in memory to avoid repeated disk IO.

This is one of the biggest reasons Linux appears to "use all RAM."

Why page cache is good

  • speeds reads
  • can buffer writes before flush
  • improves filesystem performance enormously
  • is automatically managed by the kernel

Dirty vs clean pages

  • clean page cache can usually be dropped
  • dirty page cache must be written back before reclaim

Important reality

If memory pressure grows, page cache is often reclaimed first because it is cheaper than murdering active anonymous memory.


Slab / slub allocator and kernel memory

Kernel memory is not just one blob. The kernel maintains caches for common object types:

  • dentries
  • inodes
  • task structures
  • networking objects
  • various metadata

This is often shown as slab memory.

Why it matters

A system can look memory-stressed not because app heap exploded, but because kernel object caches or pinned kernel allocations grew.

Tools

  • /proc/slabinfo
  • slabtop
  • /proc/meminfo

Allocation and page faults

Memory is often allocated lazily.

Example:

  • process calls malloc()
  • allocator reserves virtual address range
  • physical pages may not be committed until first touch
  • on access, a page fault occurs and the kernel maps/fills a page

Minor vs major faults

  • minor fault: no disk IO needed, mapping established
  • major fault: disk IO or heavier recovery needed

High fault rates can be normal or pathological depending on type and workload phase.


Zones and watermarks

Linux divides physical memory into zones for hardware/architecture constraints.

Typical x86 concepts include:

  • DMA
  • DMA32
  • Normal
  • Movable

The kernel maintains watermarks and reserve logic so allocations do not completely wreck system forward progress.

Why you care

You can have "memory available" overall but still fail specific allocations due to fragmentation or zone constraints.


Reclaim

When free memory falls, the kernel tries to reclaim pages.

Broadly, reclaim prefers pages that are cheaper to recover:

  • cold page cache
  • reclaimable slab
  • inactive anonymous pages, possibly to swap

Active and inactive lists

Linux tracks page activity heuristically to decide what is likely cold enough to reclaim.

Direct reclaim vs background reclaim

  • background reclaim is performed by kernel threads such as kswapd
  • direct reclaim happens when an allocating thread gets dragged into reclaim work itself

Direct reclaim is bad news for latency. It means your app is now helping clean the mess instead of doing its job.


Writeback

Dirty file-backed pages must eventually be written to storage.

This matters because reclaim and writeback interact:

  • heavy dirtying can pressure memory
  • slow storage can stall reclaim
  • writeback throttling can hurt app latency

Common symptoms:

  • memory pressure with lots of dirty/writeback pages
  • storage bottleneck masquerading as a memory problem

Swap

Swap allows anonymous memory pages to be evicted from RAM to swap space.

What swap is not

Swap is not "always bad." It is a tradeoff tool.

Benefits

  • gives reclaim another option besides OOM
  • can smooth transient spikes
  • lets cold anonymous pages leave RAM
  • can keep page cache available for IO-heavy workloads

Costs

  • if the system actively churns swapped pages, latency can become catastrophic
  • swap-on-slow-disk is painful
  • bad swappiness tuning can hide memory pressure until it becomes a swamp monster

Key subtlety

File-backed pages usually do not need swap; they can often just be dropped. Anonymous memory is the main swap candidate.


OOM killer

If reclaim cannot free enough memory, Linux may invoke the OOM killer.

The OOM killer selects a victim based on several factors, including badness heuristics and adjustments.

Important points

  • the process using the most RSS is not automatically the victim
  • cgroup-scoped OOM can kill within a memory-limited container even if the host has RAM
  • "Killed" in app logs is often your only clue if you were not watching kernel messages

Common causes

  • memory leak
  • bad limits
  • container memory ceiling too low
  • runaway tmpfs / page cache patterns in constrained environments
  • too many concurrent workers

NUMA

On NUMA systems, memory is not equally close to every CPU. Access to local node memory is faster than remote memory.

Why NUMA matters

  • poor placement increases latency and reduces throughput
  • workloads can become bottlenecked despite plenty of total RAM
  • CPU pinning without memory locality awareness can hurt badly

Policies and tools

  • numactl
  • NUMA balancing
  • memory policy controls
  • cgroup cpuset interactions

Transparent Huge Pages

THP lets the kernel use larger pages automatically where beneficial.

Benefits

  • fewer page table entries
  • lower TLB pressure
  • better performance for some memory-intensive workloads

Risks

  • latency spikes from compaction
  • weird interactions with databases or memory-sensitive systems
  • fragmentation side effects

This is why some environments disable THP for certain workloads.


Memory in containers and cgroups

Container memory behavior is one of the most misunderstood production topics.

Key facts

  • container memory limits are enforced via cgroups
  • reclaim happens within that policy context
  • page cache counts too, not just heap
  • a container can hit OOM while the host still looks comfortable overall
  • limits without requests/planning create chaos

Common surprise

A service does moderate file IO inside a container, page cache grows, limit is hit, and everyone screams "memory leak" even though the process heap is not the main issue.


Reading /proc/meminfo sanely

Important fields:

  • MemTotal
  • MemFree
  • MemAvailable
  • Buffers
  • Cached
  • Slab
  • SReclaimable
  • SUnreclaim
  • Dirty
  • Writeback
  • AnonPages
  • Mapped
  • SwapTotal
  • SwapFree

Use MemAvailable, not just MemFree

MemFree alone is almost useless for judging whether the system is healthy.

Linux tries to use RAM. That is not failure. Reclaim stalls, swap thrash, and OOM are failure.


Common production failure patterns

1. High memory usage but system fine

Likely healthy cache usage. Check:

  • MemAvailable
  • swap activity
  • reclaim pressure
  • IO latency

2. High load and terrible latency with no obvious CPU bottleneck

Could be:

  • direct reclaim
  • writeback congestion
  • swap thrash
  • NUMA remote access
  • compaction / THP trouble

3. Container OOM on a healthy-looking node

Likely:

  • cgroup limit too low
  • page cache inside cgroup
  • wrong request/limit sizing
  • burst memory pattern not budgeted

4. "Free memory vanished"

Maybe page cache, slab growth, or tmpfs growth. That is a diagnostic branch, not a conclusion.

5. IO issue that looks like memory issue

Dirty pages pile up, reclaim stalls, app hangs. Storage and memory are in a toxic codependent relationship.


Practical debugging workflow

Step 1 - determine pressure, not just usage

Check:

  • free -h
  • /proc/meminfo
  • vmstat 1
  • sar -B
  • PSI memory pressure if available

Step 2 - separate categories

Ask:

  • anonymous memory high?
  • page cache high?
  • slab high?
  • dirty/writeback high?
  • swap active?
  • cgroup-limited?

Step 3 - look for symptoms of pain

  • major faults
  • swap in/out
  • direct reclaim
  • OOM messages
  • elevated IO wait
  • application latency spikes

Step 4 - account for containers and cgroups

Host totals can hide cgroup-local disasters.


Interview angles

Useful questions hidden here:

  • difference between page cache and process memory
  • what MemAvailable means
  • why Linux uses spare RAM
  • how swap helps and hurts
  • what causes OOM
  • what THP is
  • why a container can OOM even when host RAM exists
  • what NUMA is and why locality matters
  • what direct reclaim is and why latency suffers

Strong answers separate usage, reclaimability, and pressure.


Mental model to keep

Memory on Linux is an economy, not a bucket.

Pages are constantly being:

  • allocated
  • mapped
  • dirtied
  • cached
  • reclaimed
  • swapped
  • compacted
  • pinned
  • killed over

The right question is not "how much is used?"

It is:

  • which pages are being used for what,
  • how expensive are they to reclaim,
  • and is the system under memory pressure severe enough to hurt work?

References


Wiki Navigation

Prerequisites