Quiz: OOMKilled¶

4 questions

L0 (1 questions)¶

1. What does OOMKilled mean and what exit code does it produce?

Show answer

OOMKilled means the Linux kernel's OOM killer terminated the container's process because it exceeded its cgroup memory limit. The process receives SIGKILL (signal 9), which cannot be caught. Exit code is 137 (128 + 9). The container had no chance to shut down gracefully.

L1 (1 questions)¶

1. What is the difference between memory requests and memory limits, and how do they affect OOMKilled behavior?

Show answer

Requests are scheduling guarantees — the kubelet reserves this memory on the node. Limits are hard ceilings enforced by the kernel cgroup. If the container exceeds its limit, it is OOMKilled. If no limits are set, the container can consume unbounded memory and may cause the node's OOM killer to target other pods as well. *Common mistake:* Setting requests equal to limits gives Guaranteed QoS, making the pod the last to be evicted under node memory pressure.

L2 (1 questions)¶

1. A Java application keeps getting OOMKilled even though you set the memory limit to 2Gi and the JVM heap to 1.5G (-Xmx1536m). Why?

Show answer

JVM memory is not just heap. Total JVM memory includes: heap (1.5G) + metaspace + thread stacks (1MB per thread x N threads) + direct buffers + code cache + GC overhead. A JVM with -Xmx1536m can easily use 1.8-2.0G total. Set the limit to at least 1.5x the heap or use -XX:MaxRAMPercentage=75 to let the JVM auto-size to 75% of the container limit. *Common mistake:* People set -Xmx equal to the container limit, leaving no room for non-heap JVM memory.

L3 (1 questions)¶

1. Your cluster is experiencing node-level OOM kills where the kernel kills pods that are within their own limits. How is this possible and how do you prevent it?

Show answer

This happens when pods without memory limits (BestEffort QoS) or with requests much lower than limits (Burstable QoS) collectively overcommit node memory. The kernel OOM killer picks victims based on oom_score. Prevention: (1) Always set memory limits on all pods. (2) Use LimitRange to enforce default limits per namespace. (3) Use ResourceQuota to cap total namespace consumption. (4) Set requests close to limits to reduce overcommit. (5) Monitor node memory with Prometheus and alert before pressure triggers kills.