- k8s
- l1
- topic-pack
- oomkilled --- Portal | Level: L1: Foundations | Topics: OOMKilled (alias) | Domain: Kubernetes
OOMKilled - Primer¶
Why This Matters¶
OOMKilled is the second most common Kubernetes troubleshooting scenario after CrashLoopBackOff. When a container exceeds its memory cgroup limit, the Linux kernel's OOM killer terminates the process with no warning and no graceful shutdown. Your pod restarts, your request fails, and your on-call engineer gets paged. Misunderstanding how resource requests and limits work leads to random pod kills that look nondeterministic but are entirely predictable once you understand the mechanics.
Core Concepts¶
1. What OOMKilled Means¶
OOMKilled means the Linux kernel's Out-Of-Memory killer terminated your container's main process because it exceeded the memory limit enforced by its cgroup. The process receives SIGKILL (signal 9), which cannot be caught or handled. The exit code is 137 (128 + 9).
When you see this in kubectl describe pod:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Sun, 15 Mar 2026 02:14:33 +0000
Finished: Sun, 15 Mar 2026 02:17:58 +0000
That tells you the container used more memory than its resources.limits.memory allowed. The kernel enforced the ceiling. There is no negotiation.
Name origin: The Linux OOM (Out-Of-Memory) killer was introduced in the 2.6 kernel series. The term "OOM killer" was coined by the kernel community to describe the subsystem that selects and terminates processes when the system runs out of memory. The exit code 137 is a Unix convention: 128 + signal number, where signal 9 is SIGKILL. So 137 is a shorthand for "killed by SIGKILL." Any time you see exit code 137 in containers, it means the process was forcibly killed — almost always by the OOM killer.
Remember: The OOM diagnostic mnemonic is DTE: Describe the pod (check
Reason: OOMKilled, exit code 137), Top the pods (kubectl top podto see current memory), Events on the node (dmesg | grep oomto distinguish container-level from node-level OOM). Three commands, three layers of diagnosis.
2. Requests vs Limits¶
Kubernetes resource management has two knobs:
| Field | Purpose | Enforced by |
|---|---|---|
resources.requests.memory |
Scheduling guarantee — the kubelet reserves this much on the node | kube-scheduler |
resources.limits.memory |
Hard ceiling — the cgroup kills the process if it exceeds this | Linux kernel (cgroup v1/v2) |
What happens when you set them wrong:
- No limits set: The container can consume unbounded memory. If it exhausts node memory, the kernel OOM killer picks victims across the entire node — not just your pod. Other pods die too.
- Limit too low: The container gets OOMKilled repeatedly. It enters CrashLoopBackOff as Kubernetes keeps restarting it and it keeps hitting the wall.
- Request equals limit (Guaranteed QoS): The container gets exactly what it asks for. It is the last to be evicted under memory pressure.
- Request < limit (Burstable QoS): The container can burst above its request up to its limit. Under node pressure, it may be evicted before Guaranteed pods.
3. Diagnosing OOMKilled¶
Start with kubectl describe pod:
$ kubectl describe pod myapp-7f8c9d6b4-x2k9p
...
Containers:
myapp:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Sun, 15 Mar 2026 02:14:33 +0000
Finished: Sun, 15 Mar 2026 02:17:58 +0000
Limits:
memory: 512Mi
Requests:
memory: 256Mi
Restart Count: 7
...
Check current memory usage across pods:
$ kubectl top pod -n production
NAME CPU(cores) MEMORY(bytes)
myapp-7f8c9d6b4-x2k9p 45m 498Mi
myapp-7f8c9d6b4-r3m7q 38m 472Mi
worker-5d4c8b7a2-k8n1p 12m 189Mi
Extract the OOMKilled reason programmatically:
$ kubectl get pod myapp-7f8c9d6b4-x2k9p -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}'
OOMKilled
4. Linux OOM Killer Mechanics¶
The kernel OOM killer is the enforcement layer beneath Kubernetes. When a cgroup hits its memory limit, the kernel invokes the OOM killer scoped to that cgroup (container-level OOM). When the entire node runs out of memory, the kernel invokes the global OOM killer (node-level OOM).
Key files in /proc:
$ cat /proc/meminfo | head -10
MemTotal: 16384000 kB
MemFree: 204800 kB
MemAvailable: 512000 kB
Buffers: 102400 kB
Cached: 819200 kB
SwapCached: 0 kB
Active: 12288000 kB
Inactive: 2048000 kB
Active(anon): 10240000 kB
Inactive(anon): 1024000 kB
Each process has an OOM score:
oom_score: ranges from 0 to ~1000. Higher = more likely to be killed.oom_score_adj: ranges from -1000 to 1000. Kubernetes sets this based on QoS class.
How the kernel picks victims:
1. Calculate each process's oom_score based on memory usage proportion
2. Add oom_score_adj to get the final score
3. Kill the process with the highest score
Kubernetes sets oom_score_adj by QoS class:
| QoS Class | oom_score_adj | Kill priority |
|---|---|---|
| BestEffort | 1000 | Killed first |
| Burstable | 2-999 (scaled) | Killed second |
| Guaranteed | -997 | Killed last |
This is why it is not always your container that dies. Under node-level memory pressure, the kernel kills BestEffort pods first, then Burstable, then Guaranteed.
5. Common Causes¶
Memory leak in application code: The container steadily increases memory usage until it hits the limit. Restart count climbs over hours or days.
JVM heap not matching container limit: A Java app with -Xmx1g in a container limited to 512Mi will be OOMKilled immediately. The JVM requests heap memory from the OS, the cgroup enforces the ceiling, and the process dies.
# Wrong: JVM wants 1GB, container limit is 512Mi
resources:
limits:
memory: "512Mi"
env:
- name: JAVA_OPTS
value: "-Xmx1g"
Sidecar containers eating memory: An Istio sidecar, log collector, or monitoring agent shares the pod's memory budget. If the sidecar uses 200Mi and your app limit is 512Mi total per container, you have less headroom than expected. Check all containers in the pod.
No resource limits set (Burstable/BestEffort QoS): Without limits, a misbehaving container can consume all available node memory and trigger node-level OOM, taking down other pods.
Undersized limits for the workload: A data processing job that loads a 2GB dataset into memory with a 1Gi limit will always fail. Profile first, then set limits.
6. Fixing It¶
Profile first, do not guess: Run the application under realistic load and observe actual memory usage with kubectl top pod over time before setting limits. Do not pick round numbers and hope.
Use Vertical Pod Autoscaler (VPA) for recommendations:
$ kubectl get vpa myapp-vpa -o jsonpath='{.status.recommendation.containerRecommendations[0]}'
{
"containerName": "myapp",
"lowerBound": {"memory": "200Mi"},
"target": {"memory": "384Mi"},
"upperBound": {"memory": "600Mi"}
}
Fix the memory leak: Profile the application with language-appropriate tools (pprof for Go, heap dumps for Java, tracemalloc for Python). An OOMKill that happens after hours of uptime usually indicates a leak.
Gotcha: Before Java 10 (released March 2018), the JVM was not cgroup-aware. It would read
/proc/meminfoto determine total system memory, ignoring the container's cgroup limit entirely. A JVM running with-Xmxcalculated as a percentage of "system memory" in a 512Mi container on a 64 GB node would try to allocate gigabytes of heap and get OOMKilled immediately. Java 10+ reads cgroup limits by default. If you are stuck on Java 8, use the flags-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap.
Configure JVM to respect container limits:
This tells the JVM to use at most 75% of the cgroup memory limit, leaving room for non-heap memory (metaspace, thread stacks, native allocations).
Understand QoS classes:
| QoS Class | Condition | Eviction priority |
|---|---|---|
| Guaranteed | requests == limits for all containers | Last evicted |
| Burstable | requests < limits (or only one set) | Middle |
| BestEffort | No requests or limits set | First evicted |
For critical workloads, set requests equal to limits to get Guaranteed QoS.
7. Node-Level OOM vs Container-Level OOM¶
Container-level OOM: The container exceeded its own cgroup memory limit. Only that container is killed. You see OOMKilled in kubectl describe pod.
Node-level OOM: The node itself ran out of memory. The kernel's global OOM killer picks victims across all processes. Multiple pods may die simultaneously. The kubelet may also evict pods before the kernel OOM killer fires.
How to tell the difference:
# Check kernel OOM events on the node
$ dmesg | grep -i "oom-kill\|out of memory"
[482918.392] myapp invoked oom-killer: gfp_mask=0xcc0, order=0
[482918.401] Memory cgroup out of memory: Killed process 18234 (java)
# Check kubelet logs for eviction
$ journalctl -u kubelet | grep -i "evict\|memory"
Mar 15 02:17:55 node01 kubelet[1892]: eviction manager: attempting to reclaim memory
Mar 15 02:17:55 node01 kubelet[1892]: eviction manager: must evict pod(s) to reclaim memory
The kubelet has eviction thresholds that trigger before the kernel OOM killer:
--eviction-hard=memory.available<100Mi
--eviction-soft=memory.available<300Mi
--eviction-soft-grace-period=memory.available=30s
When memory.available drops below the hard eviction threshold, the kubelet evicts pods starting with BestEffort, then Burstable. If eviction is too slow and memory runs out completely, the kernel OOM killer fires as a last resort.
Key diagnostic difference:
- Container OOM: Single pod affected, Exit Code: 137, Reason: OOMKilled in pod status
- Node OOM: Multiple pods affected, dmesg shows kernel OOM, kubelet logs show eviction activity, kubectl describe node shows MemoryPressure: True
8. Prevention¶
ResourceQuotas: Enforce per-namespace memory budgets so one team cannot starve others:
apiVersion: v1
kind: ResourceQuota
metadata:
name: mem-quota
namespace: production
spec:
hard:
requests.memory: "8Gi"
limits.memory: "16Gi"
LimitRanges: Set default limits for containers that do not specify their own:
apiVersion: v1
kind: LimitRange
metadata:
name: mem-limit-range
namespace: production
spec:
limits:
- default:
memory: "512Mi"
defaultRequest:
memory: "256Mi"
type: Container
Monitoring with Prometheus: Alert before the OOM killer fires:
# Container memory usage as percentage of limit
container_memory_working_set_bytes{container!=""}
/ on(namespace, pod, container)
container_spec_memory_limit_bytes{container!=""}
> 0.9
The metric container_memory_working_set_bytes is what the OOM killer actually looks at — it excludes inactive file cache. Do not use container_memory_usage_bytes for OOM prediction because it includes reclaimable cache.
Set alerts at 80% and 90% thresholds. The 80% alert gives you time to investigate. The 90% alert means OOMKill is imminent.
Wiki Navigation¶
Related Content¶
- Oomkilled Flashcards (CLI) (flashcard_deck, L1) — OOMKilled (alias)