Pattern: StatefulSet OrderedReady Deadlock¶

ID: FP-037 Family: Configuration Landmine Frequency: Uncommon Blast Radius: Single Service Detection Difficulty: Subtle

The Shape¶

StatefulSets with podManagementPolicy: OrderedReady (the default) bring pods up and down sequentially and wait for each pod to be Ready before proceeding to the next. If the application requires another pod to be running before it can become Ready (e.g., it needs to replicate from pod-0 before pod-1 can serve traffic), the StatefulSet can deadlock: pod-0 is waiting for pod-1, but pod-1 won't start until pod-0 is Ready.

How You'll See It¶

In Kubernetes¶

$ kubectl get pods
NAME        READY   STATUS    RESTARTS
mydb-0      0/1     Running   0        # waiting for mydb-1 to be a replication source
mydb-1      0/1     Pending   0        # waiting for mydb-0 to be Ready

kubectl describe pod mydb-0 shows the readiness probe failing because it requires a quorum (needs at least 2 members). The StatefulSet won't start mydb-1 until mydb-0 is Ready. Deadlock.

In CI/CD¶

A database CI environment uses a StatefulSet for a 3-node cluster. Each node requires at least one peer to form quorum before its readiness probe passes. OrderedReady means only one pod starts at a time. The first pod waits for quorum indefinitely. CI never gets a working database environment.

The Tell¶

StatefulSet pod-0 is Running but not Ready. No higher-numbered pods exist (StatefulSet is waiting for pod-0 to be Ready before starting pod-1). Pod-0's readiness probe failure log indicates it's waiting for something that requires pod-1 to exist. podManagementPolicy: OrderedReady (or absent, which defaults to OrderedReady).

Common Misdiagnosis¶

Looks Like	But Actually	How to Tell the Difference
Application bug in pod-0	OrderedReady deadlock	Readiness probe logic requires peer availability; not a code bug
Resource constraint	Management policy deadlock	Resources available; pod-0 is Running but deliberately not Ready
Network issue	StatefulSet policy prevents multi-pod start	Pod-1 was never started; it's not unreachable — it doesn't exist yet

The Fix (Generic)¶

Immediate: Set podManagementPolicy: Parallel to allow all pods to start simultaneously: kubectl patch statefulset mydb -p '{"spec":{"podManagementPolicy":"Parallel"}}'. Note: this is disruptive if pods are already running.
Short-term: For new StatefulSets requiring quorum, use podManagementPolicy: Parallel.
Long-term: Design readiness probes to not require a quorum for initial startup; use a startup probe for cluster formation that's more lenient than the steady-state readiness probe.

Real-World Examples¶

Example 1: CockroachDB StatefulSet with OrderedReady. Readiness probe checked cluster health (requires quorum). First pod waited for the cluster to form, but the cluster couldn't form because only one pod was allowed to start. Deadlock until Parallel policy was applied.
Example 2: Kafka StatefulSet: each broker required ZooKeeper quorum (3 ZK pods). ZK StatefulSet used OrderedReady. ZK pods each required quorum to be Ready. ZK pod-0 waited for pod-1, pod-1 waited for pod-0. Manual intervention required.

War Story¶

New cluster setup: StatefulSet for a 3-node Elasticsearch cluster. Pod-0 started; health check required 2 active nodes. Pod-0 stayed in 0/1 Not Ready for 20 minutes. We stared at the logs: "cluster health: RED, active shards: 0/1." We scaled the statefulset to 3, thinking we'd start all 3 at once. Nothing happened — OrderedReady still waited for pod-0. Found the setting, changed to Parallel, all 3 pods started together, cluster formed in 45 seconds. The default OrderedReady is correct for databases that replicate sequentially; it's wrong for quorum-based systems that need N members simultaneously.

Cross-References¶

Topic Packs: k8s-ops
Footguns: k8s-ops/footguns.md — "StatefulSet with wrong podManagementPolicy"
Related Patterns: FP-032 (rollout hang — another "configuration prevents progress" shape), FP-014 (two-node quorum — quorum requirements are the root)