Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config¶
Immediate Fix (Linux Ops — Domain C)¶
The fix involves adjusting the CFS bandwidth slice on the node and increasing the CPU limit for the workers.
Step 1: Increase the CPU limit for email-workers (immediate relief)¶
$ kubectl patch deployment email-worker -n prod --type=json \
-p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/cpu","value":"2000m"}]'
deployment.apps/email-worker patched
Step 2: Adjust the CFS bandwidth slice on affected nodes¶
$ ssh worker-node-04
$ sudo sysctl -w kernel.sched_cfs_bandwidth_slice_us=8000
kernel.sched_cfs_bandwidth_slice_us = 8000
# Make persistent
$ echo "kernel.sched_cfs_bandwidth_slice_us = 8000" | sudo tee -a /etc/sysctl.d/99-k8s-cfs.conf
$ sudo sysctl -p /etc/sysctl.d/99-k8s-cfs.conf
Step 3: Apply to all nodes with the new kernel¶
$ ansible k8s_workers -m sysctl -a "name=kernel.sched_cfs_bandwidth_slice_us value=8000 sysctl_file=/etc/sysctl.d/99-k8s-cfs.conf state=present reload=yes"
Step 4: Drain the queue backlog¶
# The workers should now process at full speed
# Monitor queue depth
$ kubectl exec rabbitmq-0 -n prod -- rabbitmqctl list_queues name messages | grep email
email-notifications 847
# (rapidly decreasing from 14,287)
Verification¶
Domain A (Observability) — Queue depth and processing rate normal¶
# Queue depth
# rabbitmq_queue_messages{queue="email-notifications"}
# Result: 12 (baseline < 100)
# Processing rate
# rate(jobs_processed_total{service="email-worker"}[5m])
# Result: 98 per worker (total 490/s across 5 workers)
Domain B (Kubernetes) — No CPU throttling¶
$ kubectl exec email-worker-8d7e6f5a4-j9k8l -n prod -- cat /sys/fs/cgroup/cpu/cpu.stat
nr_periods 12847
nr_throttled 124
throttled_time 289471000
# Throttle rate: 0.97% (was 94.7%)
Domain C (Linux Ops) — CFS slice configured¶
$ ssh worker-node-04 "sysctl kernel.sched_cfs_bandwidth_slice_us"
kernel.sched_cfs_bandwidth_slice_us = 8000
$ ssh worker-node-04 "cat /etc/sysctl.d/99-k8s-cfs.conf"
kernel.sched_cfs_bandwidth_slice_us = 8000
Prevention¶
- Monitoring: Add CPU throttling metrics to dashboards and alerts. The critical metric is
container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total.
- alert: ContainerCPUThrottling
expr: |
rate(container_cpu_cfs_throttled_periods_total[5m])
/ rate(container_cpu_cfs_periods_total[5m]) > 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.container }} is CPU-throttled {{ $value | humanizePercentage }}"
-
Runbook: After any kernel upgrade, verify CFS bandwidth behavior has not changed. Test CPU-bound workloads in staging before promoting the kernel to production nodes.
-
Architecture: Consider removing CPU limits entirely for batch/worker workloads (set only requests, not limits). This avoids CFS throttling while still providing scheduling guarantees. If limits are required, use burstable QoS with generous limits relative to actual usage.