Db Internals¶

96 cards — 🟢 22 easy | 🟡 40 medium | 🔴 28 hard

🟢 Easy (22)¶

1. What is the default index type in PostgreSQL and what queries does it support?

Show answer

B-tree. It supports equality (=) and range queries (<, >, BETWEEN) on sortable data. It is the most commonly used index type.

Remember: an index is a data structure (usually B-tree) that speeds up lookups at the cost of slower writes and extra storage. Like a book's index — you don't read every page to find a topic.

2. What command shows the actual execution plan with real timings for a query in PostgreSQL?

Show answer

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) followed by the query. ANALYZE is required to actually execute the query and show real timing and row counts instead of estimates.

Remember: EXPLAIN ANALYZE actually executes the query — be careful with UPDATE/DELETE! Wrap in a transaction: BEGIN; EXPLAIN ANALYZE ...; ROLLBACK;

3. What PostgreSQL command updates table statistics used by the query planner?

Show answer

ANALYZE table_name; (for a specific table) or just ANALYZE; (for the entire database). Stale statistics cause the planner to choose suboptimal execution plans.

Remember: EXPLAIN ANALYZE actually runs the query and shows real execution times. Look for: Seq Scan (missing index), Nested Loop on large tables (consider Hash Join), high actual vs estimated rows (stale statistics).

4. When should you NOT add an index to a table?

Show answer

Avoid indexing when the table is very small (seq scan is faster), the column has very low selectivity (e.g., boolean with 50/50 distribution), the table is write-heavy with few reads (indexes slow down INSERT/UPDATE/DELETE), or you already have too many indexes causing write amplification and bloat.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

5. What PostgreSQL lock level do INSERT, UPDATE, and DELETE operations acquire?

Show answer

RowExclusiveLock. This lock conflicts with ShareLock and AccessExclusiveLock but allows concurrent row modifications on different rows.

Remember: lock granularity: row > page > table > database. Finer locks = more concurrency but more overhead. Most OLTP workloads use row-level locks.

6. What operations acquire an AccessExclusiveLock in PostgreSQL and why is it dangerous?

Show answer

ALTER TABLE, DROP TABLE, and TRUNCATE acquire AccessExclusiveLock, which conflicts with every other lock type. This blocks all concurrent reads and writes on the table, potentially causing connection pile-ups if the operation is slow.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

7. How do you prevent a PostgreSQL query from waiting indefinitely for a lock?

Show answer

SET lock_timeout = '5s'; This causes the query to fail immediately with an error if it cannot acquire the requested lock within the timeout period, instead of blocking indefinitely.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

8. What is the difference between row-level and table-level locks in PostgreSQL?

Show answer

Row-level locks (e.g., from SELECT FOR UPDATE, UPDATE, DELETE) lock individual rows, allowing concurrent access to other rows in the same table. Table-level locks (e.g., AccessExclusiveLock from ALTER TABLE) lock the entire table, blocking all concurrent operations. Row-level locks scale better for concurrent workloads.

Remember: lock granularity: row > page > table > database. Finer locks = more concurrency but more overhead. Most OLTP workloads use row-level locks.

9. What is a ShareLock and when does PostgreSQL acquire one?

Show answer

ShareLock is acquired by CREATE INDEX (non-concurrent). It allows concurrent reads but blocks writes (INSERT, UPDATE, DELETE). This is why CREATE INDEX on a large table can cause write outages. Use CREATE INDEX CONCURRENTLY instead, which takes a weaker lock and allows writes to continue.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

10. What is the primary purpose of read replicas in a database architecture?

Show answer

To scale read traffic by offloading read queries from the primary to one or more replica nodes, reducing load on the primary and improving overall throughput for read-heavy workloads.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

11. How do you check if a PostgreSQL instance is currently running as a replica?

Show answer

SELECT pg_is_in_recovery(); — returns true if the instance is in recovery mode (i.e., running as a replica), false if it is the primary.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

12. How do you measure replication delay in seconds on a PostgreSQL replica?

Show answer

SELECT now() - pg_last_xact_replay_timestamp() AS replication_delay; This shows the time since the last replayed transaction from the primary.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

13. What are recovery conflicts on a PostgreSQL replica and how do you handle them?

Show answer

Recovery conflicts occur when a query running on a replica blocks WAL replay (e.g., the primary vacuums a row the replica query is reading). PostgreSQL cancels the query after max_standby_streaming_delay (default 30s). Handle by increasing the delay, using hot_standby_feedback = on (tells primary not to vacuum rows replicas need), or accepting occasional query cancellations.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

14. How does PostgreSQL stream changes from a primary to replicas?

Show answer

Via WAL (Write-Ahead Log) shipping. Every change is written to WAL first, then streamed to replicas for replay.

Remember: database internals knowledge is what separates a developer who uses databases from an engineer who operates them. Understanding the storage engine prevents mysterious performance issues.

Gotcha: always test database configuration changes on a non-production replica first. A bad setting can crash the database or corrupt data.

15. What is the key difference between synchronous and asynchronous replication?

Show answer

In synchronous replication, the primary waits for at least one replica to confirm the write before committing. In asynchronous, the primary commits immediately without waiting, which is faster but risks data loss on failover.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

16. What PostgreSQL system view shows the current replication status and lag?

Show answer

pg_stat_replication. It shows client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, and can be used to calculate replication lag in bytes.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

17. What is the simplest way to monitor replication lag from the primary side?

Show answer

Query pg_stat_replication on the primary: SELECT client_addr, state, replay_lag FROM pg_stat_replication; The replay_lag column (PG 10+) shows the time since the last WAL replayed on each replica. Alert if replay_lag exceeds your SLA threshold (e.g., > 5 seconds for warning, > 30 seconds for critical).

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

18. What do the four letters in ACID stand for and what does each mean?

Show answer

Atomicity (all or nothing), Consistency (data moves between valid states), Isolation (concurrent transactions don't interfere), Durability (committed data survives crashes).

Remember: ACID = Atomicity (all or nothing), Consistency (valid state), Isolation (concurrent transactions don't interfere), Durability (committed data survives crashes). Mnemonic: 'All Changes Isolated Durably.'

19. What is the default isolation level in PostgreSQL?

Show answer

Read Committed. This means a transaction can only see data committed before each statement (no dirty reads), but may see different results for the same query if another transaction commits between statements (non-repeatable reads).

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

20. How do you set a statement timeout in PostgreSQL to prevent runaway queries?

Show answer

SET statement_timeout = '30s'; for per-session, or configure statement_timeout in postgresql.conf (in milliseconds) for a global default.

Remember: database internals knowledge is what separates a developer who uses databases from an engineer who operates them. Understanding the storage engine prevents mysterious performance issues.

Gotcha: always test database configuration changes on a non-production replica first. A bad setting can crash the database or corrupt data.

21. What does Durability mean in practice and how does PostgreSQL guarantee it?

Show answer

Durability means once a transaction is committed, the data survives even if the server crashes immediately after. PostgreSQL guarantees this by writing all changes to the WAL (Write-Ahead Log) and fsyncing the WAL to disk before returning the commit acknowledgment. On recovery, any committed but unwritten data pages are replayed from WAL.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

22. What happens if you run statements without BEGIN in PostgreSQL?

Show answer

PostgreSQL runs each statement in an implicit auto-commit transaction — the statement is automatically committed on success or rolled back on failure. Wrapping multiple statements in BEGIN...COMMIT groups them into a single atomic transaction where either all succeed or all are rolled back together.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

🟡 Medium (40)¶

1. How can you identify tables that may be missing indexes in PostgreSQL?

Show answer

Query pg_stat_user_tables for tables with high seq_scan counts and high seq_tup_read values relative to idx_scan. A large table with many sequential scans and few or no index scans likely needs an index on commonly filtered columns.

Example: SELECT schemaname, relname, seq_scan, idx_scan FROM pg_stat_user_tables WHERE seq_scan > 1000 AND idx_scan < 50 ORDER BY seq_scan DESC; — high seq_scan + low idx_scan = missing index candidate.

2. In EXPLAIN ANALYZE output, what does a Seq Scan on a large table indicate and how do you fix it?

Show answer

A sequential scan on a large table typically means there is no suitable index for the query's WHERE clause or join condition. Fix by creating an index on the filtered columns. Verify improvement by running EXPLAIN ANALYZE again and confirming an Index Scan or Index Only Scan.

Remember: EXPLAIN ANALYZE actually runs the query and shows real execution times. Look for: Seq Scan (missing index), Nested Loop on large tables (consider Hash Join), high actual vs estimated rows (stale statistics).

3. How do you find the most expensive queries in PostgreSQL using pg_stat_statements?

Show answer

SELECT query, calls, total_exec_time, mean_exec_time, rows FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; This requires the pg_stat_statements extension to be loaded.

Remember: PostgreSQL statistics views (pg_stat_user_tables, pg_stat_statements) are your best friends for performance tuning. Enable pg_stat_statements in shared_preload_libraries.

4. What is index bloat and how do you fix it online in PostgreSQL 12+?

Show answer

Index bloat occurs when dead tuples accumulate in an index, wasting disk space and slowing queries. Fix with REINDEX INDEX CONCURRENTLY idx_name, which rebuilds the index without locking the table for writes.

Remember: an index is a data structure (usually B-tree) that speeds up lookups at the cost of slower writes and extra storage. Like a book's index — you don't read every page to find a topic.

5. How should you decide column order in a composite index when multiple columns are filtered?

Show answer

Place the most selective column (fewest matching rows) first, then the next most selective, unless one column is used in range scans — put equality columns before range columns. For example, (status, created_at) is better than (created_at, status) if status is tested with = and created_at with BETWEEN.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

6. What is an index-only scan and what condition must be met for PostgreSQL to use one?

Show answer

An index-only scan satisfies the query entirely from the index without visiting the heap (table). PostgreSQL can use it when all columns in SELECT, WHERE, and ORDER BY are in the index AND the visibility map shows the pages are all-visible (recently vacuumed). Check EXPLAIN for "Index Only Scan" and watch the "Heap Fetches" count.

Remember: an index is a data structure (usually B-tree) that speeds up lookups at the cost of slower writes and extra storage. Like a book's index — you don't read every page to find a topic.

7. What is a partial index and when is it useful?

Show answer

A partial index includes only rows matching a WHERE predicate: CREATE INDEX idx ON orders (created_at) WHERE status = 'pending'. It is smaller and faster than a full index because it skips rows that don't match the predicate. Useful when queries consistently filter on a subset of rows.

Remember: an index is a data structure (usually B-tree) that speeds up lookups at the cost of slower writes and extra storage. Like a book's index — you don't read every page to find a topic.

8. How do you find and safely remove unused indexes in PostgreSQL?

Show answer

Query pg_stat_user_indexes for indexes with idx_scan = 0 (or very low) over a representative time period. Before dropping, verify the index is not used for unique constraints or foreign key lookups. Use DROP INDEX CONCURRENTLY to avoid locking the table during removal.

Example: SELECT schemaname, relname, seq_scan, idx_scan FROM pg_stat_user_tables WHERE seq_scan > 1000 AND idx_scan < 50 ORDER BY seq_scan DESC; — high seq_scan + low idx_scan = missing index candidate.

9. How do you find which queries are blocked by locks in PostgreSQL?

Show answer

Join pg_locks with pg_stat_activity: query pg_locks for rows WHERE NOT granted to find blocked locks, then join to pg_stat_activity on pid to see the blocked and blocking queries, users, and durations.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

10. How does PostgreSQL detect and resolve deadlocks?

Show answer

PostgreSQL runs a deadlock detector every deadlock_timeout interval (default 1 second). When it detects a cycle of lock dependencies, it terminates the youngest transaction in the cycle and returns a deadlock_detected error to that client.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

11. What is the difference between optimistic and pessimistic locking?

Show answer

Pessimistic locking acquires locks before modifying data (SELECT FOR UPDATE), preventing conflicts upfront — best for high-contention scenarios. Optimistic locking checks a version or timestamp at commit time and retries on conflict — best for low-contention scenarios with longer transactions.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

12. What does SELECT FOR UPDATE do and when would you use it?

Show answer

It acquires a row-level lock (RowShareLock) on the selected rows, preventing other transactions from modifying or locking them until the current transaction completes. Use it for pessimistic locking patterns like inventory decrement where you need to read-then-write atomically.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

13. What are practical strategies to prevent deadlocks in application code?

Show answer

Always acquire locks in a consistent order across all transactions (e.g., sort rows by primary key before locking). Keep transactions short to reduce the window for conflicts. Use lock_timeout to fail fast instead of waiting indefinitely. Retry deadlocked transactions with exponential backoff in application code.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

14. What is lock escalation and does PostgreSQL do it?

Show answer

Lock escalation is when a database engine automatically converts many fine-grained locks (row-level) into a coarser lock (table-level) to reduce memory overhead. PostgreSQL does NOT escalate locks — it maintains row-level locks regardless of count. SQL Server and some other engines do escalate, which can cause unexpected blocking.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

15. What are advisory locks in PostgreSQL and when would you use them?

Show answer

Advisory locks are application-defined locks that PostgreSQL manages but does not enforce on any table or row. Acquire with pg_advisory_lock(key) or pg_try_advisory_lock(key). Use them for application-level coordination like ensuring only one worker processes a job, rate limiting, or distributed mutex patterns. They must be explicitly released.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

16. How do you use pg_stat_activity and pg_locks together to diagnose lock waits?

Show answer

Join pg_stat_activity (for query text and duration) with pg_locks (for lock details): SELECT blocked.pid, blocked.query, blocking.pid AS blocker_pid, blocking.query AS blocker_query FROM pg_stat_activity blocked JOIN pg_locks bl ON bl.pid = blocked.pid JOIN pg_locks bk ON bk.relation = bl.relation AND bk.granted AND bl.pid != bk.pid JOIN pg_stat_activity blocking ON blocking.pid = bk.pid WHERE NOT bl.granted;

Remember: PostgreSQL statistics views (pg_stat_user_tables, pg_stat_statements) are your best friends for performance tuning. Enable pg_stat_statements in shared_preload_libraries.

17. Name three tools used for connection routing between a PostgreSQL primary and read replicas.

Show answer

PgBouncer (connection pooler, does not route by query type), Pgpool-II (query-aware load balancer that routes SELECTs to replicas), and HAProxy (TCP-level load balancing with health checks).

Remember: each database connection consumes memory (~5-10 MB in PostgreSQL). Use connection pooling (PgBouncer, ProxySQL) to serve thousands of app connections with dozens of DB connections.

18. What is the read-after-write consistency problem with read replicas?

Show answer

When a user writes data to the primary and then immediately reads from a replica, the replica may not have replayed the write yet due to replication lag. The user sees stale data or gets a 404 for a record they just created. Fix by routing read-your-own-writes to the primary or a synchronous replica.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

19. When should reads go to the primary vs an async replica?

Show answer

Reads requiring strong consistency (authentication, payments, inventory checks) should go to the primary or a synchronous replica. Reads tolerating eventual consistency (dashboards, search, analytics, reporting) can safely go to async replicas.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

20. What are typical alerting thresholds for replication lag on read replicas?

Show answer

Warning at replication lag > 5 seconds (sustained for 2 minutes), Critical at lag > 30 seconds or replica disconnected (for 1 minute), Page when replica count drops below the minimum healthy count (e.g., fewer than 2 of 3 replicas connected).

Remember: good alerting follows the RED method for services (Rate, Errors, Duration) and USE method for resources (Utilization, Saturation, Errors).

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

21. What are two ways to measure replication lag and what are their limitations?

Show answer

Time-based: SELECT now() - pg_last_xact_replay_timestamp(). Limitation: shows false lag when the primary is idle (no new transactions to replay).
Byte-based: compare pg_current_wal_lsn() on primary with pg_last_wal_replay_lsn() on replica. Limitation: bytes don't directly translate to time — a large transaction may inflate byte lag without meaning the replica is far behind.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

22. How does Pgpool-II route queries differently from PgBouncer?

Show answer

Pgpool-II parses SQL and routes SELECT statements to replicas and writes to the primary automatically. PgBouncer is a connection pooler only — it does not inspect queries or route by type. For query-aware routing with PgBouncer, the application must use separate connection strings for read and write pools.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

23. What is cascading replication and when is it useful?

Show answer

Cascading replication allows a replica to stream WAL to other replicas instead of all replicas connecting directly to the primary. Configure with primary_conninfo pointing to an upstream replica. Useful when you have many replicas — it reduces network and CPU load on the primary by fanning out through intermediate replicas.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

24. What metrics should you monitor on read replicas to detect problems early?

Show answer

Replication lag (seconds and bytes), replay rate (WAL replayed per second), connection count vs max_connections, CPU and I/O utilization, streaming state in pg_stat_replication (should be 'streaming'), and query cancellation rate on the replica due to recovery conflicts (pg_stat_database_conflicts).

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

25. Name three common causes of replication lag in PostgreSQL.

Show answer

Under-provisioned replica (CPU/IO), long-running queries on the replica blocking WAL replay, network saturation between primary and replica, and large transactions generating excessive WAL.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

26. How do you promote a PostgreSQL standby to primary (two methods)?

Show answer

Using pg_ctl promote -D /path/to/data on the command line, or via SQL with SELECT pg_promote(). Both convert the standby from recovery mode to a read-write primary.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

27. What is split-brain in database replication and why is it dangerous?

Show answer

Split-brain occurs when two nodes both believe they are the primary and accept writes simultaneously. This causes data divergence that is extremely difficult to reconcile, potentially leading to data loss or corruption.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

28. Name three strategies for preventing split-brain in database clusters.

Show answer

Use a consensus mechanism (e.g., Patroni with etcd/ZooKeeper/Consul), implement fencing/STONITH to power off the old primary, and require quorum-based decisions where a majority must agree before promotion.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

29. What are the practical trade-offs between synchronous and asynchronous replication?

Show answer

Synchronous: zero data loss on failover, but higher write latency (every commit waits for replica ACK) and reduced availability (if the sync replica goes down, writes block unless you configure synchronous_standby_names with multiple candidates).
Asynchronous: lower write latency and no availability impact from replica failures, but potential data loss on failover equal to the replication lag at crash time.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

30. What is the difference between WAL shipping (physical) and logical replication?

Show answer

WAL shipping sends raw WAL bytes to replicas that replay them identically — the replica is a byte-for-byte copy. Logical replication decodes WAL into logical change events (INSERT, UPDATE, DELETE) and applies them, allowing selective table replication, cross-version replication, and different indexes or schemas on the subscriber. Logical replication has higher overhead but more flexibility.

Remember: WAL = Write-Ahead Log. All changes are written to the log BEFORE being applied to data files. Ensures crash recovery: replay the WAL to recover uncommitted changes.

31. What are replication slots and why do they matter for WAL retention?

Show answer

A replication slot tells the primary to retain WAL segments until the connected replica has consumed them, preventing the primary from recycling WAL that the replica still needs. Without slots, a slow or disconnected replica may fall too far behind and require a full base backup to resync. The risk: a dead replica with an active slot causes unbounded WAL accumulation on the primary, filling the disk.

Remember: WAL = Write-Ahead Log. All changes are written to the log BEFORE being applied to data files. Ensures crash recovery: replay the WAL to recover uncommitted changes.

32. What are the key steps to set up streaming replication in PostgreSQL?

Show answer

1. On the primary: set wal_level = replica, max_wal_senders >= number of replicas, create a replication user.
2. Take a base backup: pg_basebackup -h primary -D /data -U replicator -Fp -Xs -P.
3. On the replica: configure primary_conninfo in postgresql.conf (or recovery.conf for PG < 12) and create standby.signal.
4. Start the replica — it connects and streams WAL continuously.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

33. List the four SQL isolation levels from weakest to strongest.

Show answer

Read Uncommitted (allows dirty reads), Read Committed (no dirty reads), Repeatable Read (no non-repeatable reads), Serializable (no phantom reads — full isolation). PostgreSQL treats Read Uncommitted as Read Committed.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

34. What happens when PostgreSQL detects a deadlock?

Show answer

PostgreSQL automatically detects deadlocks (checking every deadlock_timeout interval, default 1s) and terminates the youngest transaction involved, allowing the other to proceed. The terminated transaction receives an error that the application should handle with a retry.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

35. How do you find transactions running longer than 5 minutes in PostgreSQL?

Show answer

SELECT pid, now() - xact_start AS duration, query, state FROM pg_stat_activity WHERE state != 'idle' AND xact_start < now() - interval '5 minutes' ORDER BY duration DESC;

Remember: transaction isolation levels (low to high): Read Uncommitted, Read Committed, Repeatable Read, Serializable. Higher isolation = fewer anomalies but more locking/overhead.

36. Why are long-running transactions dangerous in PostgreSQL?

Show answer

They hold locks that block other transactions (including DDL like ALTER TABLE), prevent VACUUM from reclaiming dead tuples (causing table and index bloat), increase MVCC overhead, and can exhaust connection pool resources.

Remember: transaction isolation levels (low to high): Read Uncommitted, Read Committed, Repeatable Read, Serializable. Higher isolation = fewer anomalies but more locking/overhead.

37. What is the practical difference between Read Committed and Serializable isolation?

Show answer

Read Committed sees only data committed before each individual statement — two identical SELECTs in the same transaction can return different results if another transaction commits between them. Serializable guarantees the transaction sees a frozen snapshot and behaves as if all transactions ran one at a time. Serializable detects conflicts and aborts one transaction, requiring retry logic.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

38. What are dirty reads, non-repeatable reads, and phantom reads?

Show answer

Dirty read: reading uncommitted data from another transaction (not possible in PostgreSQL). Non-repeatable read: re-reading a row and getting different values because another transaction committed an update. Phantom read: re-running a query and getting different rows because another transaction committed an insert or delete matching the WHERE clause.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

39. What are savepoints and how do they help inside a transaction?

Show answer

A savepoint marks a point within a transaction you can roll back to without aborting the entire transaction: SAVEPOINT sp1; ... ROLLBACK TO sp1; This is useful for retrying a portion of work (e.g., an insert that might violate a unique constraint) while keeping earlier work in the same transaction intact.

Remember: transaction isolation levels (low to high): Read Uncommitted, Read Committed, Repeatable Read, Serializable. Higher isolation = fewer anomalies but more locking/overhead.

40. How should connection pools handle transactions?

Show answer

Connection pools (PgBouncer, HikariCP) must ensure a transaction runs entirely on a single connection — mid-transaction connection switching corrupts state. In PgBouncer, use transaction pooling mode (not statement mode) and avoid SET commands that persist beyond the transaction. Always close transactions promptly to return connections to the pool; idle-in-transaction connections waste pool capacity.

Remember: transaction isolation levels (low to high): Read Uncommitted, Read Committed, Repeatable Read, Serializable. Higher isolation = fewer anomalies but more locking/overhead.

Remember: each database connection consumes memory (~5-10 MB in PostgreSQL). Use connection pooling (PgBouncer, ProxySQL) to serve thousands of app connections with dozens of DB connections.

🔴 Hard (28)¶

1. What is a covering index and how do you create one in PostgreSQL?

Show answer

A covering index includes all columns needed by a query so it can be satisfied entirely from the index (index-only scan) without accessing the table. Create with: CREATE INDEX idx_name ON table (filter_col) INCLUDE (col1, col2). The INCLUDE columns are stored in the index but not used for searching.

Remember: an index is a data structure (usually B-tree) that speeds up lookups at the cost of slower writes and extra storage. Like a book's index — you don't read every page to find a topic.

2. Why does column order matter in a composite index?

Show answer

A composite index on (A, B) efficiently supports queries filtering on A alone or on A and B together, but not on B alone. The index follows a leftmost-prefix rule — it can only skip to a specific value of the first column, then scan within it. Ordering columns by selectivity and query patterns is critical.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

3. How does covering index behavior differ between PostgreSQL and MySQL InnoDB?

Show answer

In PostgreSQL, you explicitly create covering indexes with INCLUDE columns. In MySQL InnoDB, every secondary index implicitly includes the primary key columns, and the clustered index (primary key) stores the full row. Look for "Using index" in MySQL EXPLAIN Extra column to confirm an index-only scan.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

4. What is the difference between REINDEX and ANALYZE, and when do you run each?

Show answer

REINDEX rebuilds an index from scratch to eliminate bloat (dead space from updates/deletes). ANALYZE updates the planner's statistics about data distribution. Run REINDEX when index bloat causes slow scans or excessive disk usage. Run ANALYZE after bulk loads or major data changes so the planner picks optimal plans.

Remember: an index is a data structure (usually B-tree) that speeds up lookups at the cost of slower writes and extra storage. Like a book's index — you don't read every page to find a topic.

5. Compare B-tree, hash, GIN, and GiST index types in PostgreSQL.

Show answer

B-tree: default, supports equality and range queries on sortable data. Hash: equality only, smaller than B-tree for that case but not WAL-logged before PG 10.
GIN (Generalized Inverted Index): best for multi-valued columns like arrays, JSONB, and full-text search.
GiST (Generalized Search Tree): best for geometric, range, and proximity queries (e.g., PostGIS, tsquery).

Remember: B-tree = balanced tree where each node can have many children. Keeps data sorted, supports O(log n) lookups. Used by PostgreSQL, MySQL, SQLite for indexes.

6. What is an expression index and when would you use one?

Show answer

An expression index indexes the result of a function or expression: CREATE INDEX idx ON users (lower(email)). Use it when queries filter on transformed values (e.g., case-insensitive search, date truncation). Without it, PostgreSQL cannot use a regular index on the column because the expression changes the lookup value.

Remember: an index is a data structure (usually B-tree) that speeds up lookups at the cost of slower writes and extra storage. Like a book's index — you don't read every page to find a topic.

7. Why should you set lock_timeout before running ALTER TABLE in production, and what happens if it times out?

Show answer

ALTER TABLE acquires AccessExclusiveLock, which blocks all reads and writes. If the table has active queries, the ALTER waits for them, and new queries queue behind it — creating a cascading pile-up. Setting lock_timeout = '3s' causes the ALTER to fail fast if it cannot get the lock, avoiding the pile-up. On timeout, the ALTER is aborted and no schema change is made.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

8. What do NOWAIT and SKIP LOCKED do with SELECT FOR UPDATE, and what are their use cases?

Show answer

NOWAIT causes the query to fail immediately if any selected row is already locked (instead of waiting). SKIP LOCKED silently skips rows that are locked by other transactions. SKIP LOCKED is ideal for work-queue patterns where multiple workers pull jobs from the same table without blocking each other.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

9. How does MySQL InnoDB lock wait behavior differ from PostgreSQL?

Show answer

MySQL InnoDB uses innodb_lock_wait_timeout (default 50 seconds) compared to PostgreSQL lock_timeout (not set by default). MySQL InnoDB also uses gap locks at REPEATABLE READ isolation to prevent phantom reads, which can cause unexpected lock conflicts on ranges of rows that PostgreSQL avoids through its MVCC implementation.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

10. How do long-running transactions cause lock pile-ups in production?

Show answer

A long-running transaction holds its locks for the entire duration. If an ALTER TABLE then arrives, it requests AccessExclusiveLock and queues behind the long transaction. All subsequent queries on that table queue behind the ALTER, creating a cascading pile-up. Even read queries are blocked because they cannot acquire AccessShareLock while AccessExclusiveLock is pending.

Remember: transaction isolation levels (low to high): Read Uncommitted, Read Committed, Repeatable Read, Serializable. Higher isolation = fewer anomalies but more locking/overhead.

11. How do you implement optimistic locking with a version column?

Show answer

Add a version integer column to the table. On read, fetch the current version. On update: UPDATE items SET data = $1, version = version + 1 WHERE id = $2 AND version = $3. If zero rows are affected, another transaction modified the row — retry the entire read-modify-write cycle. This avoids holding locks during user think time.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

12. How does PostgreSQL's synchronous_commit setting enable per-transaction consistency control with replicas?

Show answer

synchronous_commit can be set per transaction: 'remote_apply' ensures the write is replayed on the sync replica before commit returns (safe for immediate reads from that replica). 'local' only waits for local WAL flush (faster but replica may lag). This lets you choose strong vs eventual consistency on a per-query basis.

Remember: transaction isolation levels (low to high): Read Uncommitted, Read Committed, Repeatable Read, Serializable. Higher isolation = fewer anomalies but more locking/overhead.

13. In a read replica architecture, what happens if all replicas become unreachable?

Show answer

All read traffic either fails (if the application requires replicas) or falls back to the primary (if routing allows it), potentially overloading the primary. A well-designed system has circuit breakers to detect failed replicas, routes reads to the primary as a fallback with rate limiting, and alerts immediately so ops can restore replicas or add capacity.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

14. How do you verify from the primary that all expected replicas are connected and healthy?

Show answer

SELECT application_name, client_addr, state, pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag_bytes, replay_lag FROM pg_stat_replication; Check that the count matches expected replicas, all show state = 'streaming', and replay_lag_bytes is within acceptable thresholds.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

15. Name three strategies for achieving read-after-write consistency with async replicas.

Show answer

1. Route read-your-own-writes to the primary for a short window after each write (e.g., 5 seconds).
2. Include the primary's LSN in the write response and only read from a replica once its replay LSN has caught up.
3. Use synchronous_commit = 'remote_apply' for critical writes so the replica is guaranteed to have replayed them before the commit returns.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

16. What steps are involved in promoting a replica to primary during failover?

Show answer

1. Verify the replica is the most up-to-date (least lag).
2. Fence or stop the old primary to prevent split-brain.
3. Promote the replica: SELECT pg_promote() or pg_ctl promote.
4. Update DNS or connection routing to point at the new primary.
5. Reconfigure remaining replicas to follow the new primary.
6. Verify replication resumes and applications reconnect.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

17. When can adding read replicas make performance worse instead of better?

Show answer

When the workload is write-heavy (replicas must replay all writes but serve few reads), when replication lag causes frequent cache invalidation or application retries, when queries on replicas conflict with WAL replay causing cancellations (max_standby_streaming_delay), or when the connection routing layer adds more latency than the replica offloads.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

18. How does MySQL semi-synchronous replication differ from PostgreSQL synchronous replication?

Show answer

MySQL semi-sync (rpl_semi_sync_master_enabled) waits for at least one replica to acknowledge receipt of the binary log event, but not necessarily its replay. PostgreSQL synchronous replication can be configured to wait for write, flush, or replay (remote_apply) on the replica, offering more granular control.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

19. What is Patroni and how does it prevent split-brain during PostgreSQL failover?

Show answer

Patroni is an automated failover manager for PostgreSQL that uses a distributed consensus store (etcd, ZooKeeper, or Consul) to elect a leader. Only the node holding the leader key in the consensus store can operate as primary, preventing split-brain because consensus stores guarantee only one leader at a time.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

20. How would you calculate replication lag in seconds on a PostgreSQL replica, and what is the limitation of this approach?

Show answer

Use: SELECT EXTRACT(EPOCH FROM now() - pg_last_xact_replay_timestamp()) AS lag_seconds. The limitation is that if no writes are happening on the primary, the replay timestamp stays stale, making lag appear high even though the replica is fully caught up. Check pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() first to handle this edge case.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

21. How do fencing and STONITH prevent split-brain after a failover?

Show answer

Fencing isolates the old primary so it cannot accept writes. STONITH (Shoot The Other Node In The Head) powers off or reboots the old primary at the hardware level. Without fencing, a network partition could leave the old primary still accepting writes while the new primary also accepts writes, causing data divergence that is nearly impossible to reconcile automatically.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

22. What are the main challenges of multi-master (multi-primary) replication?

Show answer

Write conflicts: two nodes may update the same row simultaneously, requiring conflict resolution rules (last-write-wins, application-defined merge, or manual resolution). Increased complexity in schema changes (DDL must be coordinated). Higher replication overhead. Most PostgreSQL deployments avoid multi-master; tools like BDR (Bi-Directional Replication) handle it but add operational complexity.

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

23. Name three use cases where logical replication is preferred over physical streaming replication.

Show answer

1. Replicating a subset of tables (e.g., sharing only the orders table with an analytics database).
2. Replicating between different PostgreSQL major versions during a rolling upgrade.
3. Replicating to a database with different indexes, triggers, or additional columns (e.g., a search-optimized replica with extra GIN indexes).

Remember: streaming replication sends WAL records to replicas in real-time. Synchronous = guaranteed consistency, higher latency. Asynchronous = faster, risk of data loss on failover.

24. How does MySQL's default isolation level differ from PostgreSQL's, and what is the practical impact?

Show answer

MySQL defaults to REPEATABLE READ, while PostgreSQL defaults to READ COMMITTED. MySQL uses gap locks at Repeatable Read to prevent phantom reads, which can increase lock contention. PostgreSQL's MVCC-based Repeatable Read avoids extra locking but may produce serialization failures that require application retries.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

25. When would you use SERIALIZABLE isolation and what is the operational cost?

Show answer

Use Serializable for transactions that must appear to execute sequentially (e.g., financial transfers, inventory reservations). The cost is higher abort rates due to serialization conflicts, increased CPU overhead from dependency tracking, and the need for application-level retry logic. PostgreSQL implements it via Serializable Snapshot Isolation (SSI).

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

26. How do you safely terminate a specific backend in PostgreSQL, and what is the difference between pg_cancel_backend and pg_terminate_backend?

Show answer

pg_cancel_backend(pid) cancels the current query but keeps the connection alive (equivalent to sending SIGINT). pg_terminate_backend(pid) kills the entire backend process and closes the connection (equivalent to SIGTERM). Use cancel first for a gentle stop; use terminate when the backend is stuck or unresponsive.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

27. What is two-phase commit (2PC) and when is it needed?

Show answer

2PC coordinates a transaction across multiple databases or services. Phase 1 (prepare): all participants confirm they can commit. Phase 2 (commit): the coordinator tells all to commit. If any participant fails to prepare, all abort. In PostgreSQL, use PREPARE TRANSACTION 'id' and COMMIT PREPARED 'id'. Needed for distributed transactions, but adds latency and complexity — prefer saga patterns where possible.

Under the hood: understanding database internals (page layout, WAL, MVCC, buffer pool) helps you diagnose performance issues that high-level monitoring alone cannot explain.

28. What is transaction ID wraparound in PostgreSQL and why is it dangerous?

Show answer

PostgreSQL uses 32-bit transaction IDs (XIDs), which wrap around after ~4 billion transactions. If VACUUM does not freeze old row versions in time, the database must shut down to prevent data loss (it would interpret old committed rows as being in the future). Monitor with: SELECT datname, age(datfrozenxid) FROM pg_database; Alert when age exceeds 500 million.

Remember: transaction isolation levels (low to high): Read Uncommitted, Read Committed, Repeatable Read, Serializable. Higher isolation = fewer anomalies but more locking/overhead.