Diagnostic Questions¶
Before revealing the investigation path:¶
-
The email-worker pods show 100m CPU usage (20% of their 500m limit) in
kubectl top, yet each worker processes only 8 jobs/second instead of the expected 100. External dependencies (SMTP, database) respond in <5ms. Where is the bottleneck? -
The container's
cpu.statshows 94.7% of CFS periods were throttled. What does CPU throttling mean in the context of Kubernetes cgroups? Why doeskubectl topnot reflect throttling? -
The worker has 4 threads but a 500m CPU limit. How does CFS bandwidth control distribute CPU time across threads? Why does a multi-threaded workload suffer more from throttling than a single-threaded one?
-
A kernel upgrade changed CFS bandwidth slice behavior. The fix includes both increasing the CPU limit (Kubernetes) and adjusting the CFS sysctl (Linux). Why is the Linux-level fix important, and why is just increasing the limit not enough?
-
Should CPU-bound worker pods have CPU limits at all? What are the trade-offs between CPU limits (throttle risk) and no limits (noisy neighbor risk)?