Remediation: Job Queue Backlog, Worker Pod CPU Throttled, Fix Is cgroup Config¶

Immediate Fix (Linux Ops — Domain C)¶

The fix involves adjusting the CFS bandwidth slice on the node and increasing the CPU limit for the workers.

Step 1: Increase the CPU limit for email-workers (immediate relief)¶

$ kubectl patch deployment email-worker -n prod --type=json \
    -p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/cpu","value":"2000m"}]'
deployment.apps/email-worker patched

Step 2: Adjust the CFS bandwidth slice on affected nodes¶

$ ssh worker-node-04
$ sudo sysctl -w kernel.sched_cfs_bandwidth_slice_us=8000
kernel.sched_cfs_bandwidth_slice_us = 8000

# Make persistent
$ echo "kernel.sched_cfs_bandwidth_slice_us = 8000" | sudo tee -a /etc/sysctl.d/99-k8s-cfs.conf
$ sudo sysctl -p /etc/sysctl.d/99-k8s-cfs.conf

Step 3: Apply to all nodes with the new kernel¶

$ ansible k8s_workers -m sysctl -a "name=kernel.sched_cfs_bandwidth_slice_us value=8000 sysctl_file=/etc/sysctl.d/99-k8s-cfs.conf state=present reload=yes"

Step 4: Drain the queue backlog¶

# The workers should now process at full speed
# Monitor queue depth
$ kubectl exec rabbitmq-0 -n prod -- rabbitmqctl list_queues name messages | grep email
email-notifications     847
# (rapidly decreasing from 14,287)

Verification¶

Domain A (Observability) — Queue depth and processing rate normal¶

# Queue depth
# rabbitmq_queue_messages{queue="email-notifications"}
# Result: 12 (baseline < 100)

# Processing rate
# rate(jobs_processed_total{service="email-worker"}[5m])
# Result: 98 per worker (total 490/s across 5 workers)

Domain B (Kubernetes) — No CPU throttling¶

$ kubectl exec email-worker-8d7e6f5a4-j9k8l -n prod -- cat /sys/fs/cgroup/cpu/cpu.stat
nr_periods 12847
nr_throttled 124
throttled_time 289471000

# Throttle rate: 0.97% (was 94.7%)

Domain C (Linux Ops) — CFS slice configured¶

$ ssh worker-node-04 "sysctl kernel.sched_cfs_bandwidth_slice_us"
kernel.sched_cfs_bandwidth_slice_us = 8000

$ ssh worker-node-04 "cat /etc/sysctl.d/99-k8s-cfs.conf"
kernel.sched_cfs_bandwidth_slice_us = 8000

Prevention¶

Monitoring: Add CPU throttling metrics to dashboards and alerts. The critical metric is container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total.

- alert: ContainerCPUThrottling
  expr: |
    rate(container_cpu_cfs_throttled_periods_total[5m])
    / rate(container_cpu_cfs_periods_total[5m]) > 0.5
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Container {{ $labels.container }} is CPU-throttled {{ $value | humanizePercentage }}"

Runbook: After any kernel upgrade, verify CFS bandwidth behavior has not changed. Test CPU-bound workloads in staging before promoting the kernel to production nodes.
Architecture: Consider removing CPU limits entirely for batch/worker workloads (set only requests, not limits). This avoids CFS throttling while still providing scheduling guarantees. If limits are required, use burstable QoS with generous limits relative to actual usage.