Grading Rubric¶

Criterion	Strong (3)	Adequate (2)	Weak (1)
Identified misleading symptom	Recognized low `kubectl top` CPU with high process CPU usage as throttling; checked `cpu.stat`	Noticed the CPU numbers did not add up but took time to find the throttle metrics	Investigated SMTP, database, or application code for the bottleneck
Found root cause in kubernetes domain	Identified CFS throttling from CPU limits on multi-threaded workload	Found the CPU limit was too low but not why it worked before the kernel upgrade	Assumed the workers needed more replicas or the queue had a consumer bug
Remediated in linux_ops domain	Adjusted CFS bandwidth slice sysctl on all nodes; updated CPU limits; applied via Ansible	Increased CPU limits but did not fix the kernel-level sysctl	Only increased replicas (scaling around the problem, not fixing it)
Cross-domain thinking	Explained the full chain: kernel upgrade -> CFS behavior change -> throttling -> worker slowdown -> queue backlog	Acknowledged throttling but missed the kernel upgrade connection	Treated it as an application performance or capacity planning issue

Prerequisite Topic Packs¶

monitoring-fundamentals — needed for Domain A investigation (queue metrics, processing rate monitoring)
cgroups-namespaces — needed for Domain B root cause (CFS bandwidth control, CPU quotas, throttling)
k8s-pods-and-scheduling — needed for Domain B (resource requests/limits, QoS classes)
linux-kernel-tuning — needed for Domain C remediation (sysctl, CFS scheduler, kernel upgrades)
linux-performance — needed for Domain C (CPU profiling, process scheduling)