Understanding Distributed Systems Without a PhD

lesson
cap-theorem
consensus
eventual-consistency
replication
partitions
crdts ---# Understanding Distributed Systems Without a PhD

Topics: CAP theorem, consensus, eventual consistency, replication, partitions, CRDTs Level: L1–L2 (Foundations → Operations) Time: 60–75 minutes Prerequisites: None (academic concepts made practical)

The Mission¶

Your coworker says "we need to ensure strong consistency across all our microservices." Your CTO says "we should use eventual consistency for better availability." You nod knowingly. You have no idea what either of them means.

Distributed systems theory sounds academic, but it drives every real-world decision about databases, caches, message queues, and microservice architecture. This lesson translates the theory into practical understanding — what the terms actually mean, when they matter, and how to make the right tradeoff for your system.

The Fundamental Problem¶

You have data on two servers. A user writes to Server A. Another user reads from Server B. What does Server B return?

                Write: "balance = $100"
User 1 ────────────────────→ Server A
                              │
                              │ replication (takes time)
                              ↓
User 2 ←──── Read: balance = ? ── Server B

Three possible answers:

Answer	Name	Meaning
`$100` (latest)	Strong consistency	Server B waited for replication before answering
`$80` (stale)	Eventual consistency	Server B answered with what it had; will catch up later
Error	Consistent but unavailable	Server B refused to answer because it wasn't sure

Every distributed system makes this choice. There is no option that gives you "latest value, always available, and tolerant of network issues" simultaneously.

CAP Theorem: The Actual Constraint¶

Eric Brewer (2000) proved that during a network partition, a distributed system can provide at most two of three guarantees:

Consistency: Every read returns the most recent write
Availability: Every request gets a response
Partition tolerance: System continues operating despite network failures

Since network partitions DO happen in real systems, the practical choice is:

CP — During a partition, reject requests on the minority side (consistent but some requests fail). Example: etcd, ZooKeeper, PostgreSQL with synchronous replication.

AP — During a partition, serve requests on both sides (available but might return stale data). Example: Cassandra, DynamoDB, DNS.

Mental Model: Imagine two bank branches with a shared ledger. The phone line between them goes down. CP: both branches stop accepting deposits until the line is restored (no incorrect balances, but some customers can't bank). AP: both branches keep accepting deposits using their local ledger (all customers can bank, but the books might not match when the line comes back).

Gotcha: "CA" (consistent + available, not partition-tolerant) sounds attractive but is meaningless in distributed systems. If you don't tolerate partitions, you're running on a single server — not a distributed system. Network partitions are not optional; they're a fact of physics.

Consensus: How Nodes Agree¶

When multiple nodes need to agree on a value (who's the leader? what's the latest state?), they use a consensus algorithm. The most common:

Raft (etcd, Consul)¶

A leader is elected by majority vote. The leader handles all writes. Followers replicate. If the leader dies, a new election happens.

Node A (leader):  Receives write → replicates to B and C → waits for majority ack → commits
Node B (follower): Receives replication → acknowledges → applies
Node C (follower): Receives replication → acknowledges → applies

Requires a majority (quorum) to operate. With 3 nodes, needs 2. With 5, needs 3.

Paxos (Google Spanner, Chubby)¶

Theoretical predecessor to Raft. Same guarantees, much harder to understand and implement. Leslie Lamport published it in 1998; it took years for practical implementations to appear. Raft (2014) was explicitly designed to be "understandable Paxos."

Trivia: Lamport originally described Paxos using a metaphor about Greek parliament on the island of Paxos. The paper was rejected for years because reviewers found the metaphor confusing. He resubmitted it with a more conventional presentation and it was accepted. The algorithm is the same; only the storytelling changed.

Eventual Consistency: Not as Scary as It Sounds¶

Eventually consistent means: if no new writes occur, all replicas will eventually converge to the same value. The "eventually" is usually milliseconds to seconds.

Write to Server A: balance = $100
  t+0ms:   Server A has $100, Server B has $80 (stale)
  t+5ms:   Replication arrives at Server B
  t+5ms:   Server A has $100, Server B has $100 (consistent)

For most web applications, 5ms of staleness is completely acceptable. You don't need strong consistency for: - Social media feeds (seeing a post 5ms late is fine) - Product catalogs (price update propagates in seconds) - Analytics dashboards (real-time = "within a few seconds") - DNS (TTL-based caching is the original eventual consistency)

You DO need strong consistency for: - Financial transactions (account balance must be exact) - Inventory counts (don't sell the last item twice) - Leader election (can't have two leaders) - Distributed locks (mutual exclusion must be guaranteed)

War Story: An e-commerce platform used eventually consistent reads for their shopping cart. Two tabs open, same user. Add item in tab A, refresh tab B — item not there. User adds it again. Now they have two of the same item in their cart. The fix was read-after-write consistency: after a write, route the user's reads to the primary for 5 seconds.

Quick Check: Do You Get It So Far?¶

Q: Twitter shows your tweet after a 2-second delay on another device. Is this a bug?

No. It's eventual consistency. The write reached the primary; the replica your other device reads from hasn't caught up yet. For social media, this is fine.

Q: Your bank app shows $100 balance. You withdraw $100. A second later it still shows $100. Is this a bug?

Yes. Financial transactions need strong consistency. Stale reads on account balances are unacceptable — you might overdraft.

Replication Strategies¶

Synchronous replication¶

Client → Primary → Replica ACKs → Primary commits → Client gets response

The write isn't confirmed until the replica has it. Strong consistency, but: - Slower (must wait for replica ACK) - If replica is down, writes block or fail

Asynchronous replication¶

Client → Primary → Primary commits → Client gets response
                → Replica gets it later (maybe)

Fast, available, but: - Replica can be behind (stale reads) - If primary dies before replication, data is lost

Semi-synchronous¶

Client → Primary → At least 1 replica ACKs → Primary commits

Compromise: you don't lose data (at least one copy besides primary), but you're not blocked by the slowest replica.

Practical Patterns for Operators¶

Read-after-write consistency¶

User writes, then immediately reads. In an eventually consistent system, they might read their own stale data:

User writes to Primary: "Update email to alice@new.com"
User reads from Replica: "Email: alice@old.com"  ← stale!
User: "My update didn't work!"

Fix: Route reads-after-writes to the primary for a short window (1-5 seconds).

The "last writer wins" problem¶

Two users write to the same key simultaneously on different replicas:

Replica A: balance = $100 (User 1 deposits $20)
Replica B: balance = $90  (User 2 withdraws $10)
Merge: which value wins? $100? $90? Neither is correct ($110 or $80).

Fix: Use CRDTs (Conflict-free Replicated Data Types) for operations that must merge correctly. Or use consensus for operations that must be serialized (financial transactions).

Flashcard Check¶

Q1: CAP theorem — what's the practical choice?

During a network partition: CP (reject some requests, stay consistent) or AP (serve all requests, risk stale data). "CA" is not possible in distributed systems.

Q2: What is eventual consistency?

All replicas converge to the same value if writes stop. "Eventually" is typically milliseconds. Acceptable for most reads; not for financial transactions.

Q3: Why can't you have 2 leaders in a consensus system?

Quorum prevents it. A leader needs majority approval. During a partition, only the side with the majority can elect a leader. The minority side goes read-only.

Q4: Synchronous vs asynchronous replication — trade-off?

Sync: strong consistency, slower writes, blocked if replica is down. Async: fast writes, available, but replicas can be behind and data can be lost if primary dies.

Q5: When do you need strong consistency?

Financial transactions, inventory counts, leader election, distributed locks — any operation where "slightly stale" means "incorrect."

Cheat Sheet¶

Consistency Spectrum¶

Strong ←───────────────────────────────→ Eventual
  │                                           │
  Linearizable    Sequential    Causal    Eventual
  (Spanner)       (Raft)       (CRDTs)   (Cassandra)
  │                                           │
  Slowest, safest                 Fastest, most available

System Classification¶

System	Consistency	During partition
etcd	Strong (CP)	Minority rejects writes
PostgreSQL (sync rep)	Strong (CP)	Writes block if replica down
Cassandra	Tunable	Continues on both sides
DynamoDB	Eventual (AP)	Continues, may diverge
Redis Sentinel	Depends on config	Split-brain if quorum wrong
DNS	Eventual (AP)	Serves cached data

Takeaways¶

CAP is a tradeoff, not a menu. During partitions, choose consistency (reject requests) or availability (serve possibly-stale data). You can't have both.
Eventual consistency is fine for most things. Social feeds, product catalogs, dashboards — 5ms of staleness is invisible. Don't pay the cost of strong consistency unless you need it.
Quorum prevents split-brain. Majority vote ensures only one leader. Always use odd-numbered clusters.
Replication mode determines the trade-off. Synchronous = safe but slow. Asynchronous = fast but risk of data loss. Semi-synchronous = practical compromise.
This isn't just theory. Every database choice, every cache decision, every message queue configuration is a distributed systems decision — whether you realize it or not.

The Split-Brain Nightmare — when consensus fails in practice
The Database That Wouldn't Start — single-node database recovery
When the Queue Backs Up — message delivery guarantees