Distributed Systems — Trivia & Interesting Facts¶

Surprising, historical, and little-known facts about distributed systems.

The CAP theorem was first stated as a conjecture on a single PowerPoint slide¶

Eric Brewer presented the CAP conjecture at the 2000 ACM Symposium on Principles of Distributed Computing in a keynote talk. It was a single slide claiming that a distributed system can provide at most two of three guarantees: Consistency, Availability, and Partition tolerance. Seth Gilbert and Nancy Lynch formally proved it in 2002, making it one of the few computer science theorems that started as a conference slide.

The Paxos algorithm was described using a fictional Greek parliament¶

Leslie Lamport first wrote "The Part-Time Parliament" in 1989, describing a consensus algorithm using the metaphor of legislators on the Greek island of Paxos. The paper was rejected for being "too whimsical." It was finally published in 1998, and Lamport later wrote the plainer "Paxos Made Simple" in 2001, which famously begins: "The Paxos algorithm, when presented in plain English, is very simple."

Google's MapReduce paper launched the big data revolution — then Google stopped using MapReduce¶

Google published the MapReduce paper in 2004, inspiring the creation of Hadoop and the entire "big data" ecosystem. However, Google itself moved away from MapReduce internally by the early 2010s, replacing it with systems like FlumeJava, MillWheel, and eventually Cloud Dataflow. By the time the industry had fully adopted the MapReduce pattern, its inventor had already abandoned it.

The "Two Generals Problem" proves that reliable communication is impossible¶

Formalized by Jim Gray in 1978, the Two Generals Problem shows that no protocol can guarantee agreement between two parties communicating over an unreliable channel using a finite number of messages. This fundamental impossibility result means that every distributed system must accept some level of uncertainty — a fact that continues to surprise engineers new to distributed systems.

Lamport clocks were invented because wall clocks cannot be trusted in distributed systems¶

Leslie Lamport's 1978 paper "Time, Clocks, and the Ordering of Events in a Distributed System" is the most cited paper in distributed systems. Lamport showed that physical clocks are inherently unreliable for ordering events across machines and introduced logical clocks — simple counters that track causal ordering. This paper earned him the 2013 Turing Award.

Amazon's 2004 holiday outage led to the creation of microservices architecture¶

Amazon's monolithic application struggled during the 2004 holiday season, leading to cascading failures. This prompted Amazon to decompose their application into small, independent services — each owning its own data store. Werner Vogels documented this transformation, and it became the template for what the industry later called "microservices architecture."

The Byzantine Generals Problem was solved in 1982 but is still considered impractical¶

Lamport, Shostak, and Pease published the Byzantine fault tolerance solution in 1982, showing that a system can tolerate up to f Byzantine (arbitrarily malicious) failures with 3f+1 total nodes. The algorithm requires O(n^2) message complexity, which was considered impractical for decades. Bitcoin solved a probabilistic version of the problem in 2008 using proof-of-work, trading determinism for scalability.

Raft was explicitly designed because Paxos was too hard to understand¶

Diego Ongaro and John Ousterhout published the Raft consensus algorithm in 2014, explicitly motivated by Paxos being too difficult to teach and implement correctly. Their user study showed that students scored 40% better on comprehension questions about Raft versus Paxos. Raft now powers etcd (Kubernetes), CockroachDB, Consul, and dozens of other production systems.

Network partitions happen more often than most engineers think¶

A 2014 study by Bailis and Kingsbury, "The Network is Reliable," cataloged dozens of real-world partition events at companies like Google, Amazon, and Microsoft. They found that network partitions are not rare black-swan events but occur regularly — even within a single data center. This finding reinforced the practical importance of partition-tolerant system design.

Vector clocks require O(n) space per event, which makes them impractical at scale¶

Vector clocks track causal ordering by maintaining a counter per node. In a system with 1,000 nodes, every single event must carry a vector of 1,000 integers. This overhead led to the development of alternatives like dotted version vectors, interval tree clocks, and hybrid logical clocks, which provide weaker but more space-efficient ordering guarantees.