Skip to content

PostgreSQL Operations - Primer

Why This Matters

PostgreSQL is the most important relational database for operations teams. It powers everything from small application backends to multi-terabyte analytics workloads. Unlike managed database services that hide operational complexity, running PostgreSQL means you are responsible for vacuuming, WAL management, replication lag, connection pooling, backup/restore, and performance tuning. A missed autovacuum cycle can cause transaction ID wraparound — a failure mode that forces the database into read-only mode to prevent data corruption. Understanding these operational concerns is not optional.

Core Concepts

1. Essential psql Commands

# Connect to a database
psql -h localhost -U postgres -d mydb

# Connect with connection string
psql "postgresql://user:pass@host:5432/mydb?sslmode=require"

# List databases, tables, indexes
\l          -- list databases
\dt         -- list tables in current schema
\dt+        -- with sizes
\di         -- list indexes
\dn         -- list schemas
\du         -- list roles/users
\d tablename  -- describe table structure
\x          -- toggle expanded output (useful for wide rows)

# Run a query from the command line
psql -U postgres -d mydb -c "SELECT count(*) FROM orders;"

# Execute a SQL file
psql -U postgres -d mydb -f schema.sql

# Export query to CSV
psql -U postgres -d mydb -c "COPY (SELECT * FROM orders) TO STDOUT WITH CSV HEADER" > orders.csv

2. Configuration Tuning

PostgreSQL ships with conservative defaults designed to run on minimal hardware. Production tuning is mandatory.

Key parameters in postgresql.conf:

# Memory — start here
shared_buffers = '4GB'            # 25% of RAM (default 128MB is too low)
effective_cache_size = '12GB'     # 75% of RAM (hint for planner, not allocation)
work_mem = '64MB'                 # per-sort/hash operation (careful: multiplied by connections * operations)
maintenance_work_mem = '1GB'      # for VACUUM, CREATE INDEX

# WAL and checkpoints
wal_buffers = '64MB'
checkpoint_completion_target = 0.9
max_wal_size = '4GB'
min_wal_size = '1GB'

# Connections
max_connections = 200             # keep low, use pgbouncer for pooling
superuser_reserved_connections = 3

# Query planner
random_page_cost = 1.1            # SSD storage (default 4.0 is for spinning disk)
effective_io_concurrency = 200    # SSD (default 1)

# Logging
log_min_duration_statement = 1000  # log queries slower than 1s (ms)
log_checkpoints = on
log_lock_waits = on
log_temp_files = 0                 # log all temp file usage

Apply changes:

# Most settings: reload (no downtime)
sudo systemctl reload postgresql
# OR from psql:
SELECT pg_reload_conf();

# Some settings require restart (shared_buffers, max_connections, etc.)
sudo systemctl restart postgresql

3. Vacuuming and Autovacuum

PostgreSQL uses MVCC (Multi-Version Concurrency Control). Updates and deletes do not remove old row versions — they create new versions. VACUUM reclaims dead rows.

-- Check dead tuples per table
SELECT schemaname, relname, n_dead_tup, n_live_tup,
       round(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 1) AS dead_pct,
       last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC LIMIT 20;

-- Manual vacuum (reclaims space for reuse, does NOT return to OS)
VACUUM VERBOSE orders;

-- Full vacuum (returns space to OS, requires exclusive lock — blocks all queries)
VACUUM FULL orders;  -- use only during maintenance windows

-- Check autovacuum is running
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE query LIKE 'autovacuum%';

Autovacuum tuning in postgresql.conf:

autovacuum = on
autovacuum_max_workers = 4
autovacuum_vacuum_scale_factor = 0.1    # vacuum when 10% of rows are dead
autovacuum_analyze_scale_factor = 0.05  # analyze when 5% of rows change
autovacuum_vacuum_cost_delay = 2ms      # less aggressive (default 2ms)

4. Replication Setup

Streaming replication (primary/replica):

On primary (postgresql.conf):

wal_level = replica
max_wal_senders = 5
wal_keep_size = '1GB'

On primary (pg_hba.conf):

host replication replicator 10.0.0.0/24 scram-sha-256

# Create replication user on primary
psql -U postgres -c "CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_pass';"

# Bootstrap replica from primary
pg_basebackup -h primary.example.com -U replicator -D /var/lib/postgresql/16/main \
  -Fp -Xs -P -R
# -R creates standby.signal and sets primary_conninfo in postgresql.auto.conf

# Start the replica
sudo systemctl start postgresql

Check replication status:

-- On primary: check connected replicas
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;

-- On replica: check replication status
SELECT status, received_lsn, latest_end_lsn,
       latest_end_time, slot_name
FROM pg_stat_wal_receiver;

5. Backup and Restore

# Logical backup (single database, portable across versions)
pg_dump -U postgres -Fc mydb > mydb.dump        # custom format (compressed)
pg_dump -U postgres -Fp mydb > mydb.sql          # plain SQL
pg_dumpall -U postgres > all_databases.sql       # all databases + roles

# Restore from custom format
pg_restore -U postgres -d mydb --clean --if-exists mydb.dump

# Restore from plain SQL
psql -U postgres -d mydb -f mydb.sql

# Point-in-time recovery (PITR) with WAL archiving
# postgresql.conf on primary:
archive_mode = on
archive_command = 'cp %p /archive/wal/%f'

# Restore to a specific point in time
# 1. Stop PostgreSQL
# 2. Restore base backup
# 3. Create recovery.signal
# 4. Set in postgresql.auto.conf:
restore_command = 'cp /archive/wal/%f %p'
recovery_target_time = '2026-03-18 14:30:00 UTC'
recovery_target_action = 'promote'
# 5. Start PostgreSQL — it replays WAL up to the target time

6. Monitoring Queries

-- Active queries and locks
SELECT pid, age(clock_timestamp(), query_start) AS duration, usename,
       query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE state != 'idle' AND pid != pg_backend_pid()
ORDER BY duration DESC;

-- Table sizes
SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
       pg_size_pretty(pg_table_size(schemaname || '.' || tablename)) AS table_size,
       pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

-- Index usage (find unused indexes)
SELECT schemaname, tablename, indexname, idx_scan,
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;

-- Cache hit ratio (should be > 99%)
SELECT sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) AS cache_hit_ratio
FROM pg_statio_user_tables;

-- Connection count by state
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;

-- Long-running transactions (can block autovacuum)
SELECT pid, age(clock_timestamp(), xact_start) AS xact_duration, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL AND state != 'idle'
ORDER BY xact_start;

-- Kill a runaway query
SELECT pg_cancel_backend(12345);   -- graceful (SIGINT)
SELECT pg_terminate_backend(12345); -- forceful (SIGTERM)

7. Connection Pooling with PgBouncer

PostgreSQL forks a process per connection. At 500+ connections, this is expensive. PgBouncer sits in front and multiplexes.

# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction    # transaction pooling (most common)
max_client_conn = 1000     # max client connections to pgbouncer
default_pool_size = 25     # connections per user/database pair to PostgreSQL
reserve_pool_size = 5      # extra connections for burst
server_idle_timeout = 300

Pool modes: - session — connection held for entire client session (safest, least efficient) - transaction — connection returned after each transaction (best for web apps) - statement — connection returned after each statement (breaks multi-statement transactions)

Quick Reference

# Service management
sudo systemctl start/stop/restart/reload postgresql

# Check version
psql -U postgres -c "SELECT version();"

# Check config file location
psql -U postgres -c "SHOW config_file;"

# Check data directory
psql -U postgres -c "SHOW data_directory;"

# Database size
psql -U postgres -c "SELECT pg_size_pretty(pg_database_size('mydb'));"

# WAL location
psql -U postgres -c "SELECT pg_current_wal_lsn();"

# Replication lag in bytes
psql -U postgres -c "SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) FROM pg_stat_replication;"

Wiki Navigation