Db Indexing¶

18 cards — 🟢 4 easy | 🟡 8 medium | 🔴 6 hard

🟢 Easy (4)¶

1. What is the default index type in PostgreSQL and what queries does it support?

Show answer

B-tree. It supports equality (=) and range queries (<, >, BETWEEN) on sortable data. It is the most commonly used index type.

2. What command shows the actual execution plan with real timings for a query in PostgreSQL?

Show answer

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) followed by the query. ANALYZE is required to actually execute the query and show real timing and row counts instead of estimates.

3. What PostgreSQL command updates table statistics used by the query planner?

Show answer

ANALYZE table_name; (for a specific table) or just ANALYZE; (for the entire database). Stale statistics cause the planner to choose suboptimal execution plans.

4. When should you NOT add an index to a table?

Show answer

Avoid indexing when the table is very small (seq scan is faster), the column has very low selectivity (e.g., boolean with 50/50 distribution), the table is write-heavy with few reads (indexes slow down INSERT/UPDATE/DELETE), or you already have too many indexes causing write amplification and bloat.

🟡 Medium (8)¶

1. How can you identify tables that may be missing indexes in PostgreSQL?

Show answer

Query pg_stat_user_tables for tables with high seq_scan counts and high seq_tup_read values relative to idx_scan. A large table with many sequential scans and few or no index scans likely needs an index on commonly filtered columns.

2. In EXPLAIN ANALYZE output, what does a Seq Scan on a large table indicate and how do you fix it?

Show answer

A sequential scan on a large table typically means there is no suitable index for the query's WHERE clause or join condition. Fix by creating an index on the filtered columns. Verify improvement by running EXPLAIN ANALYZE again and confirming an Index Scan or Index Only Scan.

3. How do you find the most expensive queries in PostgreSQL using pg_stat_statements?

Show answer

SELECT query, calls, total_exec_time, mean_exec_time, rows FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10; This requires the pg_stat_statements extension to be loaded.

4. What is index bloat and how do you fix it online in PostgreSQL 12+?

Show answer

Index bloat occurs when dead tuples accumulate in an index, wasting disk space and slowing queries. Fix with REINDEX INDEX CONCURRENTLY idx_name, which rebuilds the index without locking the table for writes.

5. How should you decide column order in a composite index when multiple columns are filtered?

Show answer

Place the most selective column (fewest matching rows) first, then the next most selective, unless one column is used in range scans — put equality columns before range columns. For example, (status, created_at) is better than (created_at, status) if status is tested with = and created_at with BETWEEN.

6. What is an index-only scan and what condition must be met for PostgreSQL to use one?

Show answer

An index-only scan satisfies the query entirely from the index without visiting the heap (table). PostgreSQL can use it when all columns in SELECT, WHERE, and ORDER BY are in the index AND the visibility map shows the pages are all-visible (recently vacuumed). Check EXPLAIN for "Index Only Scan" and watch the "Heap Fetches" count.

7. What is a partial index and when is it useful?

Show answer

A partial index includes only rows matching a WHERE predicate: CREATE INDEX idx ON orders (created_at) WHERE status = 'pending'. It is smaller and faster than a full index because it skips rows that don't match the predicate. Useful when queries consistently filter on a subset of rows.

8. How do you find and safely remove unused indexes in PostgreSQL?

Show answer

Query pg_stat_user_indexes for indexes with idx_scan = 0 (or very low) over a representative time period. Before dropping, verify the index is not used for unique constraints or foreign key lookups. Use DROP INDEX CONCURRENTLY to avoid locking the table during removal.

🔴 Hard (6)¶

1. What is a covering index and how do you create one in PostgreSQL?

Show answer

A covering index includes all columns needed by a query so it can be satisfied entirely from the index (index-only scan) without accessing the table. Create with: CREATE INDEX idx_name ON table (filter_col) INCLUDE (col1, col2). The INCLUDE columns are stored in the index but not used for searching.

2. Why does column order matter in a composite index?

Show answer

A composite index on (A, B) efficiently supports queries filtering on A alone or on A and B together, but not on B alone. The index follows a leftmost-prefix rule — it can only skip to a specific value of the first column, then scan within it. Ordering columns by selectivity and query patterns is critical.

3. How does covering index behavior differ between PostgreSQL and MySQL InnoDB?

Show answer

In PostgreSQL, you explicitly create covering indexes with INCLUDE columns. In MySQL InnoDB, every secondary index implicitly includes the primary key columns, and the clustered index (primary key) stores the full row. Look for "Using index" in MySQL EXPLAIN Extra column to confirm an index-only scan.

4. What is the difference between REINDEX and ANALYZE, and when do you run each?

Show answer

REINDEX rebuilds an index from scratch to eliminate bloat (dead space from updates/deletes). ANALYZE updates the planner's statistics about data distribution. Run REINDEX when index bloat causes slow scans or excessive disk usage. Run ANALYZE after bulk loads or major data changes so the planner picks optimal plans.

5. Compare B-tree, hash, GIN, and GiST index types in PostgreSQL.

Show answer

B-tree: default, supports equality and range queries on sortable data. Hash: equality only, smaller than B-tree for that case but not WAL-logged before PG 10.
GIN (Generalized Inverted Index): best for multi-valued columns like arrays, JSONB, and full-text search.
GiST (Generalized Search Tree): best for geometric, range, and proximity queries (e.g., PostGIS, tsquery).

6. What is an expression index and when would you use one?

Show answer

An expression index indexes the result of a function or expression: CREATE INDEX idx ON users (lower(email)). Use it when queries filter on transformed values (e.g., case-insensitive search, date truncation). Without it, PostgreSQL cannot use a regular index on the column because the expression changes the lookup value.