Skip to content

Anti-Primer: Kubernetes Pods And Scheduling

Everything that can go wrong, will — and in this story, it does.

The Setup

A developer is deploying a memory-intensive machine learning inference service to a shared Kubernetes cluster. They need GPU nodes but the cluster has limited GPU capacity. The deployment must be live by end of day for a demo.

The Timeline

Hour 0: No Resource Requests

Deploys pods without resource requests to 'let Kubernetes figure it out'. The deadline was looming, and this seemed like the fastest path forward. But the result is pods land on nodes with insufficient memory; OOMKilled repeatedly; scheduler has no information to make good decisions.

Footgun #1: No Resource Requests — deploys pods without resource requests to 'let Kubernetes figure it out', leading to pods land on nodes with insufficient memory; OOMKilled repeatedly; scheduler has no information to make good decisions.

Nobody notices yet. The engineer moves on to the next task.

Hour 1: Wrong Node Selector

Uses a nodeSelector label that does not match any GPU nodes in the cluster. Under time pressure, the team chose speed over caution. But the result is pods stay Pending forever with 'no nodes available' events; engineer thinks the cluster is broken.

Footgun #2: Wrong Node Selector — uses a nodeSelector label that does not match any GPU nodes in the cluster, leading to pods stay Pending forever with 'no nodes available' events; engineer thinks the cluster is broken.

The first mistake is still invisible, making the next shortcut feel justified.

Hour 2: Anti-Affinity Blocks Scheduling

Sets requiredDuringSchedulingIgnoredDuringExecution anti-affinity with too few nodes. Nobody pushed back because the shortcut looked harmless in the moment. But the result is third replica cannot schedule because all available nodes already have one replica.

Footgun #3: Anti-Affinity Blocks Scheduling — sets requiredDuringSchedulingIgnoredDuringExecution anti-affinity with too few nodes, leading to third replica cannot schedule because all available nodes already have one replica.

Pressure is mounting. The team is behind schedule and cutting more corners.

Hour 3: Init Container Image Pull Failure

Init container references a private registry without imagePullSecrets. The team had gotten away with similar shortcuts before, so nobody raised a flag. But the result is pods stuck in Init:ImagePullBackOff; main container never starts; error is easy to miss.

Footgun #4: Init Container Image Pull Failure — init container references a private registry without imagePullSecrets, leading to pods stuck in Init:ImagePullBackOff; main container never starts; error is easy to miss.

By hour 3, the compounding failures have reached critical mass. Pages fire. The war room fills up. The team scrambles to understand what went wrong while the system burns.

The Postmortem

Root Cause Chain

# Mistake Consequence Could Have Been Prevented By
1 No Resource Requests Pods land on nodes with insufficient memory; OOMKilled repeatedly; scheduler has no information to make good decisions Primer: Always set resource requests reflecting actual usage
2 Wrong Node Selector Pods stay Pending forever with 'no nodes available' events; engineer thinks the cluster is broken Primer: Verify node labels before setting selectors; use kubectl get nodes --show-labels
3 Anti-Affinity Blocks Scheduling Third replica cannot schedule because all available nodes already have one replica Primer: Use preferredDuringScheduling for anti-affinity when node count is limited
4 Init Container Image Pull Failure Pods stuck in Init:ImagePullBackOff; main container never starts; error is easy to miss Primer: Verify imagePullSecrets are configured for all private registry images

Damage Report

  • Downtime: 2-4 hours of pod-level or cluster-wide disruption
  • Data loss: Risk of volume data loss if StatefulSets were affected
  • Customer impact: Intermittent 5xx errors, dropped connections, or full service outage
  • Engineering time to remediate: 10-20 engineer-hours for incident response, rollback, and postmortem
  • Reputation cost: On-call fatigue; delayed feature work; possible SLA breach notification

What the Primer Teaches

  • Footgun #1: If the engineer had read the primer, section on no resource requests, they would have learned: Always set resource requests reflecting actual usage.
  • Footgun #2: If the engineer had read the primer, section on wrong node selector, they would have learned: Verify node labels before setting selectors; use kubectl get nodes --show-labels.
  • Footgun #3: If the engineer had read the primer, section on anti-affinity blocks scheduling, they would have learned: Use preferredDuringScheduling for anti-affinity when node count is limited.
  • Footgun #4: If the engineer had read the primer, section on init container image pull failure, they would have learned: Verify imagePullSecrets are configured for all private registry images.

Cross-References