Quiz: Distributed Systems Fundamentals¶
7 questions
L1 (4 questions)¶
1. What is the CAP theorem and how does it apply to choosing a database for a microservices architecture?
Show answer
CAP states a distributed system can guarantee at most two of: Consistency (every read sees the latest write), Availability (every request gets a response), Partition tolerance (system works despite network splits). Since network partitions are unavoidable, the real choice is CP (consistent but may reject requests during partition, e.g., etcd, ZooKeeper) vs AP (available but may serve stale data, e.g., Cassandra, DynamoDB in eventual-consistency mode). Choose based on your tolerance for stale reads vs failed writes.2. What is the split-brain problem and how do distributed systems prevent it?
Show answer
Split-brain occurs when a network partition causes two subsets of nodes to each believe they are the active cluster, leading to conflicting writes and data divergence. Prevention mechanisms: quorum-based voting (majority required to accept writes — a 3-node cluster tolerates 1 failure), fencing (STONITH/fence agents power off unreachable nodes), lease-based leadership (leader must renew lease within timeout), and witness/tiebreaker nodes in even-numbered clusters.3. What are idempotent operations and why are they critical in distributed systems?
Show answer
An idempotent operation produces the same result regardless of how many times it is executed. Critical because in distributed systems, network failures force retries — without idempotency, a retry could create a duplicate order, charge a card twice, or increment a counter extra times. Techniques: use unique request IDs (idempotency keys) that the server deduplicates, prefer PUT (set to value) over POST (create new), design state machines where re-applying a transition is a no-op if already applied.4. What is a circuit breaker pattern and when should you use it in a microservices architecture?
Show answer
A circuit breaker monitors calls to a downstream service and trips open when failures exceed a threshold (e.g., 50% failure rate over 10 seconds). When open, calls fail immediately without contacting the service, preventing cascade failures and giving the downstream time to recover. After a timeout, it enters half-open state and allows a test request. Use it for any synchronous call to another service, external API, or database. Libraries: Hystrix (deprecated), resilience4j, Polly. Always pair with fallback behavior (cached data, degraded response, queue for later).L2 (3 questions)¶
1. What is the difference between Raft and Paxos consensus algorithms, and why is Raft more commonly used in modern infrastructure tools?
Show answer
Both achieve consensus in distributed systems but differ in design philosophy. Paxos is theoretically elegant but notoriously hard to implement correctly — it separates the protocol into phases that are difficult to map to real code. Raft was designed for understandability: it breaks consensus into leader election, log replication, and safety, with a single strong leader. Raft is used in etcd, Consul, CockroachDB, and TiKV because engineers can reason about and debug it. The performance difference is negligible for most workloads.2. Explain the difference between strong consistency, eventual consistency, and causal consistency. When would you choose each?
Show answer
Strong consistency: reads always return the latest write (linearizability). Use for financial transactions, distributed locks. Eventual consistency: given no new writes, all replicas converge to the same value. Use for DNS, social media feeds, caches. Causal consistency: preserves cause-and-effect ordering (if A causes B, everyone sees A before B) but allows concurrent operations to be seen in different orders. Use for collaborative editing, comment threads. Each step down trades consistency for lower latency and higher availability.3. What is a vector clock and how does it differ from a Lamport timestamp for tracking causality?