Portal | Level: L2: Operations | Topics: Linux Fundamentals | Domain: Linux
Linux Memory Management¶
Scope¶
This document explains Linux memory management from the point of view of an admin / DevOps / performance engineer. It covers:
- virtual memory and address spaces
- page tables
- anonymous memory vs page cache
- reclaim
- slab/slub caches
- NUMA
- swap
- OOM killer
- huge pages
- common failure modes
This is not kernel-hacker source commentary. It is the practical deep dive you need to stop treating memory as a single mystery number.
Big picture¶
Linux memory is not just "RAM used by processes." Memory is divided among several competing uses:
- process private memory
- shared memory
- page cache
- kernel memory
- slab caches
- filesystem metadata caches
- network buffers
- vmalloc / direct map structures
- pinned memory
- huge pages
- swap-backed pages and swap cache
Core mental model¶
physical RAM
-> managed in pages
-> mapped into virtual address spaces
-> used for anonymous memory and file-backed cache
-> reclaimed under pressure
-> spilled to swap in some cases
-> protected / partitioned by policies, zones, NUMA, and cgroups
There is no single "free RAM good, used RAM bad" rule. Linux intentionally uses spare RAM for caching. The real question is whether reclaim is healthy and whether pressure causes latency or kills.
Virtual memory basics¶
Each process sees a virtual address space, not direct raw physical memory.
Why virtual memory exists¶
It provides:
- isolation between processes
- a stable address abstraction
- lazy allocation
- memory-mapped files
- page sharing
- copy-on-write behavior after
fork() - kernel/user separation
Address translation¶
A memory access goes roughly:
CPU MMUs and TLBs make this efficient.
Important consequences¶
- processes can each think they use the same addresses
- a process can reserve more virtual space than has physical backing
- a mapped file page and an anonymous heap page are not the same thing operationally
Pages and page size¶
Memory is managed in units called pages.
Common base page size on x86_64 is 4 KiB, though larger pages exist.
Why pages matter¶
Reclaim, allocation, mapping, faults, swapping, and page cache are all page-oriented.
Larger pages¶
- Transparent Huge Pages (THP)
- explicit huge pages / hugetlb
Larger pages can reduce TLB pressure but may increase fragmentation or operational complexity.
Anonymous memory vs file-backed memory¶
This distinction is fundamental.
Anonymous memory¶
Not backed by a filesystem file.
Examples:
- heap allocations
- stack
- most runtime object allocations
- anonymous shared mappings
This memory may be reclaimed by swap if swapping is enabled and policy allows.
File-backed memory¶
Backed by files.
Examples:
- executable text segments
- shared libraries
- memory-mapped files
- page cache for file IO
Clean file-backed pages can often be dropped and reloaded from disk, making them cheaper to reclaim than dirty anonymous memory.
Why admins get confused¶
They see "used memory" and panic, but much of it may be reclaimable page cache rather than unrecoverable process private state.
Page cache¶
Page cache stores file data in memory to avoid repeated disk IO.
This is one of the biggest reasons Linux appears to "use all RAM."
Why page cache is good¶
- speeds reads
- can buffer writes before flush
- improves filesystem performance enormously
- is automatically managed by the kernel
Dirty vs clean pages¶
- clean page cache can usually be dropped
- dirty page cache must be written back before reclaim
Important reality¶
If memory pressure grows, page cache is often reclaimed first because it is cheaper than murdering active anonymous memory.
Slab / slub allocator and kernel memory¶
Kernel memory is not just one blob. The kernel maintains caches for common object types:
- dentries
- inodes
- task structures
- networking objects
- various metadata
This is often shown as slab memory.
Why it matters¶
A system can look memory-stressed not because app heap exploded, but because kernel object caches or pinned kernel allocations grew.
Tools¶
/proc/slabinfoslabtop/proc/meminfo
Allocation and page faults¶
Memory is often allocated lazily.
Example:
- process calls
malloc() - allocator reserves virtual address range
- physical pages may not be committed until first touch
- on access, a page fault occurs and the kernel maps/fills a page
Minor vs major faults¶
- minor fault: no disk IO needed, mapping established
- major fault: disk IO or heavier recovery needed
High fault rates can be normal or pathological depending on type and workload phase.
Zones and watermarks¶
Linux divides physical memory into zones for hardware/architecture constraints.
Typical x86 concepts include:
- DMA
- DMA32
- Normal
- Movable
The kernel maintains watermarks and reserve logic so allocations do not completely wreck system forward progress.
Why you care¶
You can have "memory available" overall but still fail specific allocations due to fragmentation or zone constraints.
Reclaim¶
When free memory falls, the kernel tries to reclaim pages.
Broadly, reclaim prefers pages that are cheaper to recover:
- cold page cache
- reclaimable slab
- inactive anonymous pages, possibly to swap
Active and inactive lists¶
Linux tracks page activity heuristically to decide what is likely cold enough to reclaim.
Direct reclaim vs background reclaim¶
- background reclaim is performed by kernel threads such as
kswapd - direct reclaim happens when an allocating thread gets dragged into reclaim work itself
Direct reclaim is bad news for latency. It means your app is now helping clean the mess instead of doing its job.
Writeback¶
Dirty file-backed pages must eventually be written to storage.
This matters because reclaim and writeback interact:
- heavy dirtying can pressure memory
- slow storage can stall reclaim
- writeback throttling can hurt app latency
Common symptoms:
- memory pressure with lots of dirty/writeback pages
- storage bottleneck masquerading as a memory problem
Swap¶
Swap allows anonymous memory pages to be evicted from RAM to swap space.
What swap is not¶
Swap is not "always bad." It is a tradeoff tool.
Benefits¶
- gives reclaim another option besides OOM
- can smooth transient spikes
- lets cold anonymous pages leave RAM
- can keep page cache available for IO-heavy workloads
Costs¶
- if the system actively churns swapped pages, latency can become catastrophic
- swap-on-slow-disk is painful
- bad swappiness tuning can hide memory pressure until it becomes a swamp monster
Key subtlety¶
File-backed pages usually do not need swap; they can often just be dropped. Anonymous memory is the main swap candidate.
OOM killer¶
If reclaim cannot free enough memory, Linux may invoke the OOM killer.
The OOM killer selects a victim based on several factors, including badness heuristics and adjustments.
Important points¶
- the process using the most RSS is not automatically the victim
- cgroup-scoped OOM can kill within a memory-limited container even if the host has RAM
- "Killed" in app logs is often your only clue if you were not watching kernel messages
Common causes¶
- memory leak
- bad limits
- container memory ceiling too low
- runaway tmpfs / page cache patterns in constrained environments
- too many concurrent workers
NUMA¶
On NUMA systems, memory is not equally close to every CPU. Access to local node memory is faster than remote memory.
Why NUMA matters¶
- poor placement increases latency and reduces throughput
- workloads can become bottlenecked despite plenty of total RAM
- CPU pinning without memory locality awareness can hurt badly
Policies and tools¶
numactl- NUMA balancing
- memory policy controls
- cgroup cpuset interactions
Transparent Huge Pages¶
THP lets the kernel use larger pages automatically where beneficial.
Benefits¶
- fewer page table entries
- lower TLB pressure
- better performance for some memory-intensive workloads
Risks¶
- latency spikes from compaction
- weird interactions with databases or memory-sensitive systems
- fragmentation side effects
This is why some environments disable THP for certain workloads.
Memory in containers and cgroups¶
Container memory behavior is one of the most misunderstood production topics.
Key facts¶
- container memory limits are enforced via cgroups
- reclaim happens within that policy context
- page cache counts too, not just heap
- a container can hit OOM while the host still looks comfortable overall
- limits without requests/planning create chaos
Common surprise¶
A service does moderate file IO inside a container, page cache grows, limit is hit, and everyone screams "memory leak" even though the process heap is not the main issue.
Reading /proc/meminfo sanely¶
Important fields:
MemTotalMemFreeMemAvailableBuffersCachedSlabSReclaimableSUnreclaimDirtyWritebackAnonPagesMappedSwapTotalSwapFree
Use MemAvailable, not just MemFree¶
MemFree alone is almost useless for judging whether the system is healthy.
Linux tries to use RAM. That is not failure. Reclaim stalls, swap thrash, and OOM are failure.
Common production failure patterns¶
1. High memory usage but system fine¶
Likely healthy cache usage. Check:
MemAvailable- swap activity
- reclaim pressure
- IO latency
2. High load and terrible latency with no obvious CPU bottleneck¶
Could be:
- direct reclaim
- writeback congestion
- swap thrash
- NUMA remote access
- compaction / THP trouble
3. Container OOM on a healthy-looking node¶
Likely:
- cgroup limit too low
- page cache inside cgroup
- wrong request/limit sizing
- burst memory pattern not budgeted
4. "Free memory vanished"¶
Maybe page cache, slab growth, or tmpfs growth. That is a diagnostic branch, not a conclusion.
5. IO issue that looks like memory issue¶
Dirty pages pile up, reclaim stalls, app hangs. Storage and memory are in a toxic codependent relationship.
Practical debugging workflow¶
Step 1 - determine pressure, not just usage¶
Check:
free -h/proc/meminfovmstat 1sar -B- PSI memory pressure if available
Step 2 - separate categories¶
Ask:
- anonymous memory high?
- page cache high?
- slab high?
- dirty/writeback high?
- swap active?
- cgroup-limited?
Step 3 - look for symptoms of pain¶
- major faults
- swap in/out
- direct reclaim
- OOM messages
- elevated IO wait
- application latency spikes
Step 4 - account for containers and cgroups¶
Host totals can hide cgroup-local disasters.
Interview angles¶
Useful questions hidden here:
- difference between page cache and process memory
- what
MemAvailablemeans - why Linux uses spare RAM
- how swap helps and hurts
- what causes OOM
- what THP is
- why a container can OOM even when host RAM exists
- what NUMA is and why locality matters
- what direct reclaim is and why latency suffers
Strong answers separate usage, reclaimability, and pressure.
Mental model to keep¶
Memory on Linux is an economy, not a bucket.
Pages are constantly being:
- allocated
- mapped
- dirtied
- cached
- reclaimed
- swapped
- compacted
- pinned
- killed over
The right question is not "how much is used?"
It is:
- which pages are being used for what,
- how expensive are they to reclaim,
- and is the system under memory pressure severe enough to hurt work?
References¶
- Linux memory management admin guide
- Linux memory management documentation
- Memory management concepts overview
- NUMA memory policy
- x86_64 memory management notes
Wiki Navigation¶
Prerequisites¶
- Linux Ops (Topic Pack, L0)
Related Content¶
- /proc Filesystem (Topic Pack, L2) — Linux Fundamentals
- Advanced Bash for Ops (Topic Pack, L1) — Linux Fundamentals
- Adversarial Interview Gauntlet (30 sequences) (Scenario, L2) — Linux Fundamentals
- Bash Exercises (Quest Ladder) (CLI) (Exercise Set, L0) — Linux Fundamentals
- Case Study: CI Pipeline Fails — Docker Layer Cache Corruption (Case Study, L2) — Linux Fundamentals
- Case Study: Container Vuln Scanner False Positive Blocks Deploy (Case Study, L2) — Linux Fundamentals
- Case Study: Disk Full Root Services Down (Case Study, L1) — Linux Fundamentals
- Case Study: Disk Full — Runaway Logs, Fix Is Loki Retention (Case Study, L2) — Linux Fundamentals
- Case Study: HPA Flapping — Metrics Server Clock Skew, Fix Is NTP (Case Study, L2) — Linux Fundamentals
- Case Study: Inode Exhaustion (Case Study, L1) — Linux Fundamentals
Pages that link here¶
- Linux Performance Tuning - Street-Level Ops
- Primer
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms
- Symptoms: Container Image Vuln Scanner False Positive, Blocks Deploy Pipeline
- Symptoms: Disk Full Alert, Cause Is Runaway Logs, Fix Is Loki Retention
- Symptoms: HPA Flapping, Metrics Server Clock Skew, Fix Is NTP Config