Mental Model: CAP Theorem¶
Category: System Behavior Origin: Eric Brewer, 2000 (conjecture at PODC symposium); formally proven by Gilbert and Lynch, 2002 One-liner: A distributed system can guarantee at most two of Consistency, Availability, and Partition Tolerance — and since network partitions are inevitable, the real operational choice is always between Consistency and Availability.
The Model¶
CAP Theorem states that any distributed data store can provide at most two of three guarantees simultaneously: Consistency (every read returns the most recent write or an error), Availability (every request receives a non-error response, though it may not be the latest data), and Partition Tolerance (the system continues operating despite network partitions — messages being lost or delayed between nodes).
The theorem's practical bite comes from a constraint that engineers sometimes miss: partition tolerance is not optional. Networks fail. Packets drop. Switches crash. Any system that claims CA (consistent and available, not partition-tolerant) is a single-node system or one that silently breaks during any network hiccup. In a distributed system operating at production scale, you will experience partitions. Therefore, the real choice is always between CP (consistent, partition-tolerant — sacrifice availability) and AP (available, partition-tolerant — sacrifice consistency).
A CP system returns an error or times out rather than serve stale data during a partition. etcd and ZooKeeper are CP: they refuse writes that cannot achieve quorum, and Kubernetes relies on this for its control plane. If etcd cannot reach a majority of its members, it stops accepting writes — your cluster becomes read-only. This is the right tradeoff for configuration state: a stale network policy enforced across the cluster is worse than a brief control plane outage.
An AP system continues serving reads and writes during a partition, accepting that different nodes may diverge. Cassandra and CouchDB are AP by default: a write accepted by one node will eventually propagate to others, but during a partition, two clients might read different values. Amazon DynamoDB offers tunable consistency — you can choose strongly consistent reads (CP) or eventually consistent reads (AP) per request, paying higher latency for the CP path.
Boundary conditions: CAP describes binary guarantees, but real systems operate on a spectrum. "Consistency" in CAP is linearizability — a stronger model than eventual consistency, read-your-writes, or causal consistency. Many systems provide useful consistency models that are weaker than full linearizability but stronger than pure AP. CAP also applies per-operation, not per-system: a system might be CP for writes and AP for reads. Use CAP to understand failure-time behavior, not steady-state behavior.
Visual¶
Consistency
/\
/ \
/ \
/ RDBMS \
/ (single \
/ node) \
/──────────────\
/ CP AP \
/ etcd Cassandra \
/ ZooKeeper DynamoDB \
/ Redis* (default) \
/──────────────────────────\
Availability ─────── Partition Tolerance
*Redis with replication is CP; standalone Redis is CA (single node)
During a Network Partition:
CP System (etcd):
┌─────────┐ ✗ partition ✗ ┌─────────┐
│ Node A │─────────────────────│ Node B │
│(primary)│ │(replica)│
└─────────┘ └─────────┘
Writes to A: REJECTED (no quorum) ← sacrifices Availability
AP System (Cassandra):
┌─────────┐ ✗ partition ✗ ┌─────────┐
│ Node A │─────────────────────│ Node B │
│ v=42 │ │ v=41 │
└─────────┘ └─────────┘
Writes to A: ACCEPTED, writes to B: ACCEPTED
Result: diverged state, reconciled after partition heals ← sacrifices Consistency
flowchart TD
CAP["CAP Theorem\nPick two of three"]
C["Consistency\nevery read = latest write"]
A["Availability\nevery request gets a response"]
P["Partition Tolerance\nsurvives network splits"]
CAP --> C
CAP --> A
CAP --> P
CP["CP Systems\netcd, ZooKeeper, Redis Cluster"]
AP["AP Systems\nCassandra, DynamoDB, CouchDB"]
C --- CP
P --- CP
A --- AP
P --- AP
style CP fill:#36f,color:#fff
style AP fill:#f80,color:#fff
style P fill:#5a5,color:#fff
When to Reach for This¶
- When choosing a database or distributed store for a new service: the CAP choice should flow from the consistency requirements of the data, not from benchmark numbers
- When debugging split-brain scenarios: two parts of a cluster accepting conflicting writes is a CAP AP tradeoff manifesting as a bug
- When evaluating why a service becomes unavailable during a network event (CP system protecting consistency) vs. why data becomes inconsistent (AP system protecting availability)
- When designing multi-region deployments: cross-region latency makes partitions more common, making the CP vs. AP tradeoff more consequential
- When a team debates "why did etcd stop accepting writes?" — the answer is always that it lost quorum (CP behavior working as designed)
When NOT to Use This¶
- For single-node systems: CAP only applies to distributed systems; a single PostgreSQL instance on one machine is not subject to partition tradeoffs
- As a complete consistency model: CAP's "C" is linearizability specifically. Systems can offer read-your-writes, monotonic reads, or causal consistency, which are weaker but often sufficient; these nuances are outside CAP's scope
- As the only framework for distributed system design: CAP describes failure-time behavior but says nothing about steady-state latency tradeoffs — use PACELC for those
Applied Examples¶
Example 1: Kubernetes Control Plane During a Network Partition¶
A Kubernetes cluster runs a 5-node etcd ensemble (3 in zone A, 2 in zone B). A network partition isolates zone B from zone A. The 3 nodes in zone A retain quorum; the 2 nodes in zone B lose quorum.
CAP behavior: etcd is a CP system. The zone B nodes refuse to accept new writes — they cannot confirm they have the latest state. The Kubernetes API server backed by zone A nodes continues accepting writes. Any API server in zone B that happens to be connecting to zone B etcd nodes will reject writes.
In practice: deployments triggered via the API server in zone B fail; the control plane is partially unavailable. This is the correct behavior for a system managing cluster state — accepting writes in zone B and diverging would risk two different sets of nodes enforcing contradictory policies after the partition heals.
Resolution: the network partition heals, zone B nodes rejoin and sync from zone A, and the system returns to normal. No data was lost; a brief availability window was sacrificed to preserve consistency.
Example 2: Cassandra Shopping Cart Under a Partition¶
An e-commerce platform uses Cassandra (AP) to store shopping carts. A network partition separates datacenter East from datacenter West. Clients in each region continue writing to their local nodes.
A user in the East adds item A to their cart. The write is accepted by East nodes. The same user (via a mobile app routed to West) adds item B. The write is accepted by West nodes. During the partition, these two writes are not synchronized.
CAP behavior: Cassandra accepts both writes (AP). After the partition heals, Cassandra's conflict resolution (last-write-wins by timestamp, or application-level merging) determines the final cart state. If timestamps are imprecise, item A or item B might be silently dropped.
The correct engineering response is to design for this: use a merge strategy (keep both items in the cart) rather than last-write-wins. AP systems require application-level conflict resolution design — the developer cannot ignore consistency; they must handle it explicitly.
The Junior vs Senior Gap¶
| Junior | Senior |
|---|---|
| Chooses databases based on performance benchmarks and familiarity | Asks "what is the consistency requirement of this data?" before evaluating database options |
| Surprised when etcd stops accepting writes during a network event | Expects this as correct CP behavior; has a runbook for quorum loss |
| Treats "eventual consistency" as a vague marketing phrase | Defines exactly what consistency guarantee the application requires (read-your-writes? causal? linearizable?) and maps it to a database offering |
| Investigates Cassandra split-brain as a bug | Recognizes it as the AP tradeoff and designs conflict resolution at the application layer |
Connections¶
- Complements: PACELC — CAP describes the partition-time tradeoff; PACELC extends it to describe the latency-vs-consistency tradeoff in the common case (no partition), giving a more complete picture of distributed system design
- Complements: Failure Domains — partitions occur along failure domain boundaries (AZ, rack, network segment); understanding failure domains tells you how often your CAP tradeoff will be exercised
- Tensions: Graceful Degradation — AP systems degrade gracefully under partitions at the cost of consistency; CP systems degrade by becoming unavailable, which is less graceful but safer for state-sensitive data
- Topic Packs: distributed-systems, kubernetes
- Case Studies: coredns-timeout-pod-dns (DNS unavailability during cluster events is a CAP availability vs. consistency tradeoff in the service discovery layer)