Pattern: StatefulSet OrderedReady Deadlock¶
ID: FP-037 Family: Configuration Landmine Frequency: Uncommon Blast Radius: Single Service Detection Difficulty: Subtle
The Shape¶
StatefulSets with podManagementPolicy: OrderedReady (the default) bring pods up and
down sequentially and wait for each pod to be Ready before proceeding to the next.
If the application requires another pod to be running before it can become Ready (e.g.,
it needs to replicate from pod-0 before pod-1 can serve traffic), the StatefulSet can
deadlock: pod-0 is waiting for pod-1, but pod-1 won't start until pod-0 is Ready.
How You'll See It¶
In Kubernetes¶
$ kubectl get pods
NAME READY STATUS RESTARTS
mydb-0 0/1 Running 0 # waiting for mydb-1 to be a replication source
mydb-1 0/1 Pending 0 # waiting for mydb-0 to be Ready
kubectl describe pod mydb-0 shows the readiness probe failing because it requires a
quorum (needs at least 2 members). The StatefulSet won't start mydb-1 until mydb-0 is
Ready. Deadlock.
In CI/CD¶
A database CI environment uses a StatefulSet for a 3-node cluster. Each node requires at least one peer to form quorum before its readiness probe passes. OrderedReady means only one pod starts at a time. The first pod waits for quorum indefinitely. CI never gets a working database environment.
The Tell¶
StatefulSet pod-0 is Running but not Ready. No higher-numbered pods exist (StatefulSet is waiting for pod-0 to be Ready before starting pod-1). Pod-0's readiness probe failure log indicates it's waiting for something that requires pod-1 to exist.
podManagementPolicy: OrderedReady(or absent, which defaults to OrderedReady).
Common Misdiagnosis¶
| Looks Like | But Actually | How to Tell the Difference |
|---|---|---|
| Application bug in pod-0 | OrderedReady deadlock | Readiness probe logic requires peer availability; not a code bug |
| Resource constraint | Management policy deadlock | Resources available; pod-0 is Running but deliberately not Ready |
| Network issue | StatefulSet policy prevents multi-pod start | Pod-1 was never started; it's not unreachable — it doesn't exist yet |
The Fix (Generic)¶
- Immediate: Set
podManagementPolicy: Parallelto allow all pods to start simultaneously:kubectl patch statefulset mydb -p '{"spec":{"podManagementPolicy":"Parallel"}}'. Note: this is disruptive if pods are already running. - Short-term: For new StatefulSets requiring quorum, use
podManagementPolicy: Parallel. - Long-term: Design readiness probes to not require a quorum for initial startup; use a startup probe for cluster formation that's more lenient than the steady-state readiness probe.
Real-World Examples¶
- Example 1: CockroachDB StatefulSet with
OrderedReady. Readiness probe checked cluster health (requires quorum). First pod waited for the cluster to form, but the cluster couldn't form because only one pod was allowed to start. Deadlock untilParallelpolicy was applied. - Example 2: Kafka StatefulSet: each broker required ZooKeeper quorum (3 ZK pods). ZK StatefulSet used
OrderedReady. ZK pods each required quorum to be Ready. ZK pod-0 waited for pod-1, pod-1 waited for pod-0. Manual intervention required.
War Story¶
New cluster setup: StatefulSet for a 3-node Elasticsearch cluster. Pod-0 started; health check required 2 active nodes. Pod-0 stayed in 0/1 Not Ready for 20 minutes. We stared at the logs: "cluster health: RED, active shards: 0/1." We scaled the statefulset to 3, thinking we'd start all 3 at once. Nothing happened —
OrderedReadystill waited for pod-0. Found the setting, changed toParallel, all 3 pods started together, cluster formed in 45 seconds. The defaultOrderedReadyis correct for databases that replicate sequentially; it's wrong for quorum-based systems that need N members simultaneously.
Cross-References¶
- Topic Packs: k8s-ops
- Footguns: k8s-ops/footguns.md — "StatefulSet with wrong podManagementPolicy"
- Related Patterns: FP-032 (rollout hang — another "configuration prevents progress" shape), FP-014 (two-node quorum — quorum requirements are the root)