Databases¶
32 cards — 🟢 9 easy | 🟡 9 medium | 🔴 7 hard
🟢 Easy (9)¶
1. What is database sharding and when is it needed?
Show answer
Sharding is a horizontal partitioning — splitting data across multiple database instances, each holding a subset of rows. It is needed when a single database server cannot handle the write throughput, storage volume, or query load.Remember: sharding = horizontal partitioning across multiple database instances. Each shard holds a subset of data. Key challenge: choosing the shard key to avoid hotspots.
Gotcha: cross-shard JOINs and transactions become very expensive or impossible. Design your shard key so most queries hit a single shard.
Example: shard by user_id — all of user 42's data lives on one shard. Queries for user 42 hit one database, not all of them.
Remember: sharding = horizontal partitioning across multiple database instances. Each shard holds a subset of data. Key challenge: choosing the shard key to avoid hotspots.
2. What is a connection leak?
Show answer
A connection leak is a situation where database connection isn't closed after being created and is no longer needed.Debug clue: monitor connection count in pg_stat_activity (PostgreSQL) or SHOW PROCESSLIST (MySQL). A steadily growing number of idle connections is a leak.
Gotcha: connection leaks eventually exhaust max_connections, causing new connections to fail with 'too many connections' errors — often during peak traffic.
Example: in Python, always use context managers: with conn.cursor() as cur: ... or connection pool libraries that auto-reclaim connections.
Remember: choose your database based on access patterns, not brand loyalty. Relational for transactions, document for flexible schemas, graph for relationships, time-series for metrics.
Gotcha: 'NoSQL' does not mean 'no schema' — it means schema is enforced at the application level instead of the database level. This shifts responsibility to developers.
3. What is a connection pool?
Show answer
Connection Pool is a cache of database connections and the reason it's used is to avoid an overhead of establishing a connection for every query done to a database.Number anchor: a PostgreSQL connection costs ~10MB of RAM. Without pooling, 100 microservice replicas each opening 10 connections = 1000 connections = 10GB of PostgreSQL memory just for connections.
Example: PgBouncer (PostgreSQL) and ProxySQL (MySQL) are popular external connection poolers that multiplex thousands of app connections over a few dozen database connections.
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
4. What is a Data Warehouse?
Show answer
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of organisation's decision-making processRemember: choose your database based on access patterns, not brand loyalty. Relational for transactions, document for flexible schemas, graph for relationships, time-series for metrics.
Gotcha: 'NoSQL' does not mean 'no schema' — it means schema is enforced at the application level instead of the database level. This shifts responsibility to developers.
5. What is a relational database?
Show answer
* Data Storage: system to store data in tables* SQL: programming language to manage relational databases
* Data Definition Language: a standard syntax to create, alter and delete tables
Name origin: 'Relational' comes from E.F. Codd's 1970 paper introducing the relational model, where data is organized into 'relations' (tables).
Example: popular relational databases — PostgreSQL (open source, feature-rich), MySQL/MariaDB (web-scale), SQLite (embedded), Oracle (enterprise), SQL Server (Microsoft).
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
6. What is an index in a database?
Show answer
A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records.Remember: an index is a data structure (usually B-tree) that speeds up lookups at the cost of slower writes and extra storage. Like a book's index — you don't read every page to find a topic.
7. What type of databases are you familiar with?
Show answer
Relational (SQL)NoSQL
Time series
Remember: the major categories — Relational (PostgreSQL, MySQL), Document (MongoDB), Key-Value (Redis, DynamoDB), Column-family (Cassandra), Graph (Neo4j), Time-series (InfluxDB, TimescaleDB).
Interview tip: mention specific databases you have used and why — e.g., 'PostgreSQL for ACID transactions, Redis for caching, TimescaleDB for metrics.'
Remember: choose your database based on access patterns, not brand loyalty. Relational for transactions, document for flexible schemas, graph for relationships, time-series for metrics.
Gotcha: 'NoSQL' does not mean 'no schema' — it means schema is enforced at the application level instead of the database level. This shifts responsibility to developers.
8. What is DDL (Data Definition Language) in SQL?
Show answer
[Wikipedia](https://en.wikipedia.org/wiki/Data_definition_language): "In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users."Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
9. When would you choose a NoSQL database over a relational database?
Show answer
Choose NoSQL when:* Schema is highly variable or evolving rapidly (document stores like MongoDB)
* You need horizontal write scaling (Cassandra, DynamoDB)
* Data is naturally key-value or graph-shaped (Redis, Neo4j)
* You need very low-latency lookups at massive scale
* Eventual consistency is acceptable
Stick with relational when:
* You need strong consistency and complex transactions (ACID)
* Your data has clear relationships and you need JOINs
* You need mature tooling for reporting and analytics
Remember: SQL = structured, ACID, joins, schema. NoSQL = flexible schema, horizontal scaling, eventual consistency. Use SQL for transactions, NoSQL for scale + flexibility.
🟡 Medium (9)¶
1. What is ORM? What benefits it provides in regards to relational databases usage?
Show answer
[Wikipedia](https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping): "is a programming technique for converting data between incompatible type systems using object-oriented programming languages"In regards to the relational databases:
* Database as code
* Database abstraction
* Encapsulates SQL complexity
* Enables code review process
* Enables usage as a native OOP structure
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
2. What types of data tables have you used?
Show answer
* Primary data table: main data you care about* Details table: includes a foreign key and has one to many relationship
* Lookup values table: can be one table per lookup or a table containing all the lookups and has one to many relationship
* Multi reference table
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
3. Explain Normalization
Show answer
Data that is used multiple times in a database should be stored once and referenced with a foreign key.This has the clear benefit of ease of maintenance where you need to change a value only in a single place to change it everywhere.
Remember: normalization forms: 1NF (atomic values), 2NF (no partial dependencies), 3NF (no transitive dependencies). Mnemonic: 'The key, the whole key, and nothing but the key.'
4. Explain Primary Key and Foreign Key
Show answer
Primary Key: each row in every table should a unique identifier that represents the row.Foreign Key: a reference to another table's primary key. This allows you to join table together to retrieve all the information you need without duplicating data.
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
5. What is Time Series database?
Show answer
A database designed specifically for time series based data. Time series data consists of data points indexed by time, such as metrics, logs, and sensor readings.Key optimizations:
* **Time-based partitioning**: Data is automatically partitioned by time intervals, enabling efficient queries over specific time ranges
* **Compression**: Specialized compression algorithms that exploit the temporal nature of data (e.g., delta encoding, run-length encoding)
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
6. Explain database replication and when you would use it in production.
Show answer
Database replication copies data from one database server (primary/master) to one or more replicas. Use cases:* Read scaling: route read queries to replicas to reduce primary load
* High availability: promote a replica if the primary fails
* Disaster recovery: maintain replicas in another region/AZ
* Reporting: run heavy analytics queries against a replica
Replication modes:
* Synchronous: primary waits for replica acknowledgment (stronger consistency, higher latency)
* Asynchronous: primary does not wait (lower latency, risk of data loss on failover)
* Semi-synchronous: primary waits for at least one replica
7. What strategies exist for database backup and recovery?
Show answer
Common strategies:* Logical backups (pg_dump, mysqldump): SQL-level export, portable but slow for large DBs
* Physical backups (pg_basebackup, xtrabackup): file-level copy, faster restore
* WAL/binlog archiving: continuous archiving enables point-in-time recovery (PITR)
* Snapshots: storage-level snapshots (EBS, ZFS) for near-instant backup
Best practices:
* Test restores regularly
* Store backups off-site or in a different region
* Monitor backup job success/failure
* Document RTO and RPO targets
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
8. How should database schema migrations be handled in a DevOps workflow?
Show answer
Best practices:* Use a migration tool (Flyway, Alembic, Liquibase) to version schema changes
* Store migrations in version control alongside application code
* Make migrations idempotent and reversible when possible
* Run migrations as part of CI/CD pipeline, before application deployment
* Avoid destructive changes in production (prefer additive: add column, backfill, remove old)
* Test migrations against a copy of production data
* Use transactions for DDL where supported (PostgreSQL does, MySQL mostly does not)
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
9. What database metrics should you monitor in production?
Show answer
Key metrics by category:* Connections: active connections, pool utilization, connection wait time
* Performance: query latency (p50/p95/p99), queries per second, slow query count
* Resources: CPU, memory/buffer cache hit ratio, disk I/O (IOPS, latency)
* Replication: replica lag, replication slot status
* Storage: table size growth, WAL/binlog size, disk space remaining
* Locks: lock wait time, deadlock count, long-running transactions
Tools: pg_stat_statements, Performance Schema (MySQL), CloudWatch RDS, postgres_exporter
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
🔴 Hard (7)¶
1. You find out your database became a bottleneck and users experience issues accessing data. How can you deal with such situation?
Show answer
Not much information provided as to why it became a bottleneck and what is current architecture, so one general approach could beto reduce the load on your database by moving frequently-accessed data to in-memory structure.
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
2. What does it mean when a database is ACID compliant?
Show answer
ACID ensures reliable database transactions:- **Atomicity**: Transactions succeed or fail as a whole — no partial updates
- **Consistency**: Every transaction moves the DB from one valid state to another, enforced by constraints
- **Isolation**: Concurrent transactions don't see each other's intermediate states
- **Durability**: Committed data survives crashes (written to non-volatile storage)
SQL databases are ACID by nature. Some NoSQL databases (e.g., MongoDB 4.0+) support ACID transactions, but most NoSQL systems trade ACID guarantees for performance and scalability.
Remember: ACID = Atomicity (all or nothing), Consistency (valid state), Isolation (concurrent transactions don't interfere), Durability (committed data survives crashes). Mnemonic: 'All Changes Isolated Durably.'
3. Your database performs slowly than usual. More specifically, your queries are taking a lot of time. What would you do?
Show answer
* Query for running queries and cancel the irrelevant queries* Check for connection leaks (query for running connections and include their IP)
* Check for table locks and kill irrelevant locking sessions
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
4. What is the CAP theorem and how does it apply to database selection?
Show answer
The CAP theorem states that a distributed system can guarantee at most two of three properties simultaneously:* Consistency: every read returns the most recent write
* Availability: every request receives a response
* Partition tolerance: the system continues operating despite network partitions
In practice, network partitions are unavoidable, so the trade-off is between consistency and availability during a partition.
CP: etcd, ZooKeeper, HBase
AP: Cassandra, DynamoDB (eventual consistency)
CA: single-node PostgreSQL (no partition tolerance)
Remember: CAP = Consistency, Availability, Partition tolerance. Pick 2 of 3 during a network partition. CP = consistent but may reject requests. AP = available but may return stale data.
5. How do you diagnose and fix a slow query in PostgreSQL?
Show answer
Diagnostic steps:1. Enable slow query logging: set log_min_duration_statement (e.g., 500ms)
2. Use EXPLAIN ANALYZE to see the actual execution plan
3. Look for sequential scans on large tables, nested loops, inaccurate row estimates
4. Check pg_stat_statements for aggregate query performance
Common fixes:
* Add missing indexes (check pg_stat_user_tables for seq_scan counts)
* Rewrite queries to avoid unnecessary joins or subqueries
* Run ANALYZE to update planner statistics
* Tune work_mem, shared_buffers, effective_cache_size
* Consider partitioning for very large tables
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
6. Describe a high-availability PostgreSQL setup and its failure modes.
Show answer
Typical HA setup:* Primary + synchronous standby (or Patroni-managed cluster)
* Connection pooler (PgBouncer) in front
* Automated failover via Patroni, repmgr, or cloud-managed (RDS Multi-AZ)
Failure modes:
* Primary crash: standby promoted, clients reconnect (brief downtime)
* Network partition: split-brain risk if fencing misconfigured, use quorum/witness
* Replica lag: synchronous replication prevents data loss but adds latency
* Connection pooler failure: run multiple instances behind a VIP or DNS
* Storage failure: RAID or cloud volume replication protects against single-disk failure
Remember: database fundamentals (ACID, indexing, normalization, query optimization) transfer across all database systems. Master the concepts, then learn vendor-specific syntax.
7. Explain transaction isolation levels and when each is appropriate.
Show answer
SQL standard defines four levels (weakest to strongest):* Read Uncommitted: can see uncommitted changes (dirty reads). Rarely used.
* Read Committed: only sees committed data. Default in PostgreSQL. Good for most OLTP.
* Repeatable Read: snapshot at transaction start. Default in MySQL InnoDB. Good for reporting.
* Serializable: transactions behave as if sequential. Highest overhead. Use for financial calculations or inventory where correctness is critical.
Higher isolation = more locking/overhead = lower throughput.
Remember: transaction isolation levels (low to high): Read Uncommitted, Read Committed, Repeatable Read, Serializable. Higher isolation = fewer anomalies but more locking/overhead.