Db Replication¶
18 cards — 🟢 4 easy | 🟡 8 medium | 🔴 6 hard
🟢 Easy (4)¶
1. How does PostgreSQL stream changes from a primary to replicas?
Show answer
Via WAL (Write-Ahead Log) shipping. Every change is written to WAL first, then streamed to replicas for replay.2. What is the key difference between synchronous and asynchronous replication?
Show answer
In synchronous replication, the primary waits for at least one replica to confirm the write before committing. In asynchronous, the primary commits immediately without waiting, which is faster but risks data loss on failover.3. What PostgreSQL system view shows the current replication status and lag?
Show answer
pg_stat_replication. It shows client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, and can be used to calculate replication lag in bytes.4. What is the simplest way to monitor replication lag from the primary side?
Show answer
Query pg_stat_replication on the primary: SELECT client_addr, state, replay_lag FROM pg_stat_replication; The replay_lag column (PG 10+) shows the time since the last WAL replayed on each replica. Alert if replay_lag exceeds your SLA threshold (e.g., > 5 seconds for warning, > 30 seconds for critical).🟡 Medium (8)¶
1. Name three common causes of replication lag in PostgreSQL.
Show answer
Under-provisioned replica (CPU/IO), long-running queries on the replica blocking WAL replay, network saturation between primary and replica, and large transactions generating excessive WAL.2. How do you promote a PostgreSQL standby to primary (two methods)?
Show answer
Using pg_ctl promote -D /path/to/data on the command line, or via SQL with SELECT pg_promote(). Both convert the standby from recovery mode to a read-write primary.3. What is split-brain in database replication and why is it dangerous?
Show answer
Split-brain occurs when two nodes both believe they are the primary and accept writes simultaneously. This causes data divergence that is extremely difficult to reconcile, potentially leading to data loss or corruption.4. Name three strategies for preventing split-brain in database clusters.
Show answer
Use a consensus mechanism (e.g., Patroni with etcd/ZooKeeper/Consul), implement fencing/STONITH to power off the old primary, and require quorum-based decisions where a majority must agree before promotion.5. What are the practical trade-offs between synchronous and asynchronous replication?
Show answer
Synchronous: zero data loss on failover, but higher write latency (every commit waits for replica ACK) and reduced availability (if the sync replica goes down, writes block unless you configure synchronous_standby_names with multiple candidates).Asynchronous: lower write latency and no availability impact from replica failures, but potential data loss on failover equal to the replication lag at crash time.
6. What is the difference between WAL shipping (physical) and logical replication?
Show answer
WAL shipping sends raw WAL bytes to replicas that replay them identically — the replica is a byte-for-byte copy. Logical replication decodes WAL into logical change events (INSERT, UPDATE, DELETE) and applies them, allowing selective table replication, cross-version replication, and different indexes or schemas on the subscriber. Logical replication has higher overhead but more flexibility.7. What are replication slots and why do they matter for WAL retention?
Show answer
A replication slot tells the primary to retain WAL segments until the connected replica has consumed them, preventing the primary from recycling WAL that the replica still needs. Without slots, a slow or disconnected replica may fall too far behind and require a full base backup to resync. The risk: a dead replica with an active slot causes unbounded WAL accumulation on the primary, filling the disk.8. What are the key steps to set up streaming replication in PostgreSQL?
Show answer
1. On the primary: set wal_level = replica, max_wal_senders >= number of replicas, create a replication user.2. Take a base backup: pg_basebackup -h primary -D /data -U replicator -Fp -Xs -P.
3. On the replica: configure primary_conninfo in postgresql.conf (or recovery.conf for PG < 12) and create standby.signal.
4. Start the replica — it connects and streams WAL continuously.
🔴 Hard (6)¶
1. How does MySQL semi-synchronous replication differ from PostgreSQL synchronous replication?
Show answer
MySQL semi-sync (rpl_semi_sync_master_enabled) waits for at least one replica to acknowledge receipt of the binary log event, but not necessarily its replay. PostgreSQL synchronous replication can be configured to wait for write, flush, or replay (remote_apply) on the replica, offering more granular control.2. What is Patroni and how does it prevent split-brain during PostgreSQL failover?
Show answer
Patroni is an automated failover manager for PostgreSQL that uses a distributed consensus store (etcd, ZooKeeper, or Consul) to elect a leader. Only the node holding the leader key in the consensus store can operate as primary, preventing split-brain because consensus stores guarantee only one leader at a time.3. How would you calculate replication lag in seconds on a PostgreSQL replica, and what is the limitation of this approach?
Show answer
Use: SELECT EXTRACT(EPOCH FROM now() - pg_last_xact_replay_timestamp()) AS lag_seconds. The limitation is that if no writes are happening on the primary, the replay timestamp stays stale, making lag appear high even though the replica is fully caught up. Check pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() first to handle this edge case.4. How do fencing and STONITH prevent split-brain after a failover?
Show answer
Fencing isolates the old primary so it cannot accept writes. STONITH (Shoot The Other Node In The Head) powers off or reboots the old primary at the hardware level. Without fencing, a network partition could leave the old primary still accepting writes while the new primary also accepts writes, causing data divergence that is nearly impossible to reconcile automatically.5. What are the main challenges of multi-master (multi-primary) replication?
Show answer
Write conflicts: two nodes may update the same row simultaneously, requiring conflict resolution rules (last-write-wins, application-defined merge, or manual resolution). Increased complexity in schema changes (DDL must be coordinated). Higher replication overhead. Most PostgreSQL deployments avoid multi-master; tools like BDR (Bi-Directional Replication) handle it but add operational complexity.6. Name three use cases where logical replication is preferred over physical streaming replication.
Show answer
1. Replicating a subset of tables (e.g., sharing only the orders table with an analytics database).2. Replicating between different PostgreSQL major versions during a rolling upgrade.
3. Replicating to a database with different indexes, triggers, or additional columns (e.g., a search-optimized replica with extra GIN indexes).