Skip to content

Diagnostic Questions

Before revealing the investigation path:

  1. The email-worker pods show 100m CPU usage (20% of their 500m limit) in kubectl top, yet each worker processes only 8 jobs/second instead of the expected 100. External dependencies (SMTP, database) respond in <5ms. Where is the bottleneck?

  2. The container's cpu.stat shows 94.7% of CFS periods were throttled. What does CPU throttling mean in the context of Kubernetes cgroups? Why does kubectl top not reflect throttling?

  3. The worker has 4 threads but a 500m CPU limit. How does CFS bandwidth control distribute CPU time across threads? Why does a multi-threaded workload suffer more from throttling than a single-threaded one?

  4. A kernel upgrade changed CFS bandwidth slice behavior. The fix includes both increasing the CPU limit (Kubernetes) and adjusting the CFS sysctl (Linux). Why is the Linux-level fix important, and why is just increasing the limit not enough?

  5. Should CPU-bound worker pods have CPU limits at all? What are the trade-offs between CPU limits (throttle risk) and no limits (noisy neighbor risk)?