SQL Fundamentals Footguns¶

Mistakes that cause outages, data corruption, table locks, and slow queries that bring production databases to their knees.

1. Running ALTER TABLE on a Large Table Without Understanding Locks¶

You run ALTER TABLE orders ADD COLUMN priority INTEGER DEFAULT 0 on a table with 50 million rows during business hours. On PostgreSQL < 11, this rewrites the entire table while holding an ACCESS EXCLUSIVE lock. All queries against the table block. Your application connection pool fills up in seconds. Every endpoint that touches the orders table returns 500 errors for the 10 minutes the ALTER takes.

Fix: On PostgreSQL 11+, adding a column with a simple constant default is metadata-only and fast. On older versions or for complex defaults: add the column as NULL (instant), then backfill in batches, then set the default. Always test ALTER TABLE statements on a staging database of equivalent size first. Always run schema changes during low-traffic windows.

2. Missing Index on a JOIN Column¶

Your query joins orders to customers on customer_id. The orders table has no index on customer_id. For each customer row, PostgreSQL scans the entire orders table. With 10,000 customers and 5 million orders, this produces 50 billion row comparisons. A query that should take 50ms takes 15 minutes. The database CPU spikes to 100%.

Fix: Index every foreign key column and every column used in WHERE and JOIN clauses. Check with EXPLAIN ANALYZE — if you see Seq Scan on a large table inside a Nested Loop, a missing index is almost certainly the cause:

CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);

Use CONCURRENTLY to avoid locking the table during index creation.

3. Using SELECT * in Production Queries¶

Your application runs SELECT * FROM orders WHERE id = 123. The table has 40 columns including a 500KB metadata JSONB column and a document bytea column. The query returns one row but transfers 500KB of data you do not need. Multiply by 10,000 requests/second and you are saturating the network between the app server and the database.

Fix: Always specify the columns you need: SELECT id, status, total FROM orders WHERE id = 123. This also enables covering index scans — if an index contains all requested columns, PostgreSQL never reads the table at all. SELECT * is acceptable only for ad-hoc investigation with LIMIT, never in application code.

4. Forgetting That NULL Breaks Equality and NOT IN¶

You write SELECT * FROM orders WHERE status NOT IN (SELECT status FROM excluded_statuses). The excluded_statuses table has a row with NULL status. SQL's three-valued logic means NOT IN with any NULL in the subquery returns zero rows — always. Your query returns nothing. You think there are no orders. There are 5 million.

Fix: Use NOT EXISTS instead of NOT IN when the subquery might contain NULLs:

-- SAFE: NOT EXISTS is not affected by NULLs
SELECT * FROM orders o
WHERE NOT EXISTS (
    SELECT 1 FROM excluded_statuses e WHERE e.status = o.status
);

Or filter NULLs from the subquery: NOT IN (SELECT status FROM excluded_statuses WHERE status IS NOT NULL).

5. Implicit Type Coercion Killing Index Usage¶

Your customer_id column is an integer. Your application sends the query as WHERE customer_id = '12345' (a string). PostgreSQL coerces the string to integer and uses the index. But MySQL may cast the column instead of the value, resulting in a full table scan. Or your ORM sends a numeric column comparison as a string, and the query planner cannot use the index.

Fix: Always match the type of the value to the type of the column. In application code, use parameterized queries with explicit types. Check with EXPLAIN — if a query on an indexed column shows Seq Scan, type mismatch is a common cause. In PostgreSQL, WHERE integer_col = '123' works but WHERE text_col = 123 forces a cast on every row, preventing index use.

6. Running UPDATE or DELETE Without a WHERE Clause¶

You intend to update one row: UPDATE orders SET status = 'cancelled'. You forget the WHERE clause. Every row in the table is updated. If you are in a transaction, you can ROLLBACK. If autocommit is on (the default in many clients), the change is permanent. 50 million orders are now marked as cancelled.

Fix: Always write UPDATE and DELETE statements in this order: (1) write the SELECT first to verify the WHERE clause, (2) change SELECT to UPDATE/DELETE only after confirming the row set. Use a transaction wrapper:

BEGIN;
SELECT id, status FROM orders WHERE id = 12345;  -- verify first
UPDATE orders SET status = 'cancelled' WHERE id = 12345;
-- Inspect the result, then:
COMMIT;  -- or ROLLBACK if wrong

Configure your SQL client to require explicit transactions for writes. In psql: \set AUTOCOMMIT off.

7. Giant Backfill UPDATE in a Single Transaction¶

You need to backfill a new column on a 100 million row table. You run: UPDATE orders SET new_col = old_col * 1.1. This creates a single transaction that locks or writes 100 million rows. The transaction log (WAL in PostgreSQL, binlog in MySQL) grows by gigabytes. Replication lag spikes. On MVCC databases, dead tuples pile up requiring a massive vacuum. The database may run out of disk space.

Fix: Backfill in small batches with pauses between them:

-- Batch update pattern
UPDATE orders SET new_col = old_col * 1.1
WHERE id IN (
    SELECT id FROM orders WHERE new_col IS NULL LIMIT 10000
);
-- Repeat until no rows remain
-- Add a 100ms pause between batches to let replication catch up

Script this in a loop with pg_sleep(0.1) between iterations. Monitor replication lag during the backfill.

8. Creating Indexes Without CONCURRENTLY¶

You create an index on a busy production table: CREATE INDEX idx_orders_date ON orders(created_at). This acquires a lock that blocks all writes to the table for the duration of index creation. On a 50 million row table, this takes 5-15 minutes. All INSERT, UPDATE, and DELETE operations queue up. The application connection pool is exhausted in seconds.

Fix: Use CREATE INDEX CONCURRENTLY in PostgreSQL:

CREATE INDEX CONCURRENTLY idx_orders_date ON orders(created_at);

This takes longer but does not block writes. Caveats: CONCURRENTLY cannot run inside a transaction, and if it fails, you get an invalid index that must be dropped and recreated. Always check for invalid indexes after: SELECT * FROM pg_indexes WHERE indexdef LIKE '%INVALID%'.

9. Using OFFSET for Pagination on Large Tables¶

Your API paginates results with LIMIT 20 OFFSET 100000. The database reads and discards 100,000 rows before returning 20. As users paginate deeper, performance degrades linearly. Page 5000 requires scanning 100,000 rows. The query that took 5ms on page 1 takes 5 seconds on page 5000.

Fix: Use keyset pagination (cursor-based) instead of OFFSET:

-- Instead of: SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000

-- Use: remember the last ID from the previous page
SELECT * FROM orders
WHERE id > 98765  -- last ID from previous page
ORDER BY id
LIMIT 20;

This uses the index to jump directly to the starting point. Performance is constant regardless of page depth. The tradeoff: you cannot jump to arbitrary page numbers (page 500), but in practice, API consumers rarely need this.

10. Ignoring EXPLAIN ANALYZE Output¶

A query is slow. You add more indexes, change query structure, increase work_mem — all without running EXPLAIN ANALYZE. You are guessing. The actual problem is a correlated subquery that runs once per row in the outer query, executing 50,000 subqueries. An index would not help. Rewriting as a JOIN solves it instantly.

Fix: EXPLAIN ANALYZE is the single most important debugging tool for slow queries. Read the output:

EXPLAIN ANALYZE SELECT ...;

-- Read bottom-up. Key things to look for:
-- "Seq Scan" on large table     → needs an index
-- "actual rows" >> "rows"       → stale statistics (run ANALYZE)
-- "Nested Loop" with high count → consider hash or merge join
-- "Sort Method: external merge" → needs more work_mem
-- "SubPlan" inside a loop       → rewrite as a JOIN

11. Dropping a Column That Has Dependent Objects¶

You drop a column: ALTER TABLE orders DROP COLUMN legacy_status. A view v_order_summary depends on this column. The DROP fails with a dependency error. You add CASCADE: ALTER TABLE orders DROP COLUMN legacy_status CASCADE. The CASCADE silently drops the view, a function that uses the view, and a materialized view that depends on the function. Three downstream reporting dashboards break.

Fix: Before dropping a column, check dependencies:

-- PostgreSQL: find objects that depend on a column
SELECT dependent_ns.nspname AS dependent_schema,
       dependent_view.relname AS dependent_view
FROM pg_depend
JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid
JOIN pg_class AS dependent_view ON pg_rewrite.ev_class = dependent_view.oid
JOIN pg_namespace dependent_ns ON dependent_view.relnamespace = dependent_ns.oid
WHERE pg_depend.refobjid = 'orders'::regclass;

Never use CASCADE in production without understanding the full dependency chain. Drop or recreate dependent objects explicitly.

12. Not Filtering on the Timestamp Column in Time-Series Queries¶

Your access log table has 500 million rows. You query: SELECT COUNT(*) FROM access_log WHERE path = '/api/health'. The path column has an index but the query still scans millions of rows because /api/health is the most common path. If you had added AND timestamp > NOW() - INTERVAL '1 hour', the timestamp index would narrow the scan to a few hundred thousand rows first.

Fix: For time-series tables, always include a timestamp filter:

-- BAD: scans all matching rows across all time
SELECT path, COUNT(*) FROM access_log
WHERE status >= 500 GROUP BY path;

-- GOOD: narrows to a time window first (uses timestamp index)
SELECT path, COUNT(*) FROM access_log
WHERE timestamp > NOW() - INTERVAL '1 hour'
  AND status >= 500
GROUP BY path;

Most time-series tables are partitioned or indexed by timestamp. Omitting the timestamp filter defeats the partitioning strategy and forces a full table scan.