- devops
- l2
- topic-pack
- postgresql
- database-ops --- Portal | Level: L2: Operations | Topics: PostgreSQL Operations, Database Operations | Domain: DevOps & Tooling
PostgreSQL Operations - Primer¶
Why This Matters¶
PostgreSQL is the most important relational database for operations teams. It powers everything from small application backends to multi-terabyte analytics workloads. Unlike managed database services that hide operational complexity, running PostgreSQL means you are responsible for vacuuming, WAL management, replication lag, connection pooling, backup/restore, and performance tuning. A missed autovacuum cycle can cause transaction ID wraparound — a failure mode that forces the database into read-only mode to prevent data corruption. Understanding these operational concerns is not optional.
Core Concepts¶
1. Essential psql Commands¶
# Connect to a database
psql -h localhost -U postgres -d mydb
# Connect with connection string
psql "postgresql://user:pass@host:5432/mydb?sslmode=require"
# List databases, tables, indexes
\l -- list databases
\dt -- list tables in current schema
\dt+ -- with sizes
\di -- list indexes
\dn -- list schemas
\du -- list roles/users
\d tablename -- describe table structure
\x -- toggle expanded output (useful for wide rows)
# Run a query from the command line
psql -U postgres -d mydb -c "SELECT count(*) FROM orders;"
# Execute a SQL file
psql -U postgres -d mydb -f schema.sql
# Export query to CSV
psql -U postgres -d mydb -c "COPY (SELECT * FROM orders) TO STDOUT WITH CSV HEADER" > orders.csv
2. Configuration Tuning¶
PostgreSQL ships with conservative defaults designed to run on minimal hardware. Production tuning is mandatory.
Key parameters in postgresql.conf:
# Memory — start here
shared_buffers = '4GB' # 25% of RAM (default 128MB is too low)
effective_cache_size = '12GB' # 75% of RAM (hint for planner, not allocation)
work_mem = '64MB' # per-sort/hash operation (careful: multiplied by connections * operations)
maintenance_work_mem = '1GB' # for VACUUM, CREATE INDEX
# WAL and checkpoints
wal_buffers = '64MB'
checkpoint_completion_target = 0.9
max_wal_size = '4GB'
min_wal_size = '1GB'
# Connections
max_connections = 200 # keep low, use pgbouncer for pooling
superuser_reserved_connections = 3
# Query planner
random_page_cost = 1.1 # SSD storage (default 4.0 is for spinning disk)
effective_io_concurrency = 200 # SSD (default 1)
# Logging
log_min_duration_statement = 1000 # log queries slower than 1s (ms)
log_checkpoints = on
log_lock_waits = on
log_temp_files = 0 # log all temp file usage
Apply changes:
# Most settings: reload (no downtime)
sudo systemctl reload postgresql
# OR from psql:
SELECT pg_reload_conf();
# Some settings require restart (shared_buffers, max_connections, etc.)
sudo systemctl restart postgresql
3. Vacuuming and Autovacuum¶
PostgreSQL uses MVCC (Multi-Version Concurrency Control). Updates and deletes do not remove old row versions — they create new versions. VACUUM reclaims dead rows.
-- Check dead tuples per table
SELECT schemaname, relname, n_dead_tup, n_live_tup,
round(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 1) AS dead_pct,
last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC LIMIT 20;
-- Manual vacuum (reclaims space for reuse, does NOT return to OS)
VACUUM VERBOSE orders;
-- Full vacuum (returns space to OS, requires exclusive lock — blocks all queries)
VACUUM FULL orders; -- use only during maintenance windows
-- Check autovacuum is running
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE query LIKE 'autovacuum%';
Autovacuum tuning in postgresql.conf:
autovacuum = on
autovacuum_max_workers = 4
autovacuum_vacuum_scale_factor = 0.1 # vacuum when 10% of rows are dead
autovacuum_analyze_scale_factor = 0.05 # analyze when 5% of rows change
autovacuum_vacuum_cost_delay = 2ms # less aggressive (default 2ms)
4. Replication Setup¶
Streaming replication (primary/replica):
On primary (postgresql.conf):
On primary (pg_hba.conf):
# Create replication user on primary
psql -U postgres -c "CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_pass';"
# Bootstrap replica from primary
pg_basebackup -h primary.example.com -U replicator -D /var/lib/postgresql/16/main \
-Fp -Xs -P -R
# -R creates standby.signal and sets primary_conninfo in postgresql.auto.conf
# Start the replica
sudo systemctl start postgresql
Check replication status:
-- On primary: check connected replicas
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;
-- On replica: check replication status
SELECT status, received_lsn, latest_end_lsn,
latest_end_time, slot_name
FROM pg_stat_wal_receiver;
5. Backup and Restore¶
# Logical backup (single database, portable across versions)
pg_dump -U postgres -Fc mydb > mydb.dump # custom format (compressed)
pg_dump -U postgres -Fp mydb > mydb.sql # plain SQL
pg_dumpall -U postgres > all_databases.sql # all databases + roles
# Restore from custom format
pg_restore -U postgres -d mydb --clean --if-exists mydb.dump
# Restore from plain SQL
psql -U postgres -d mydb -f mydb.sql
# Point-in-time recovery (PITR) with WAL archiving
# postgresql.conf on primary:
archive_mode = on
archive_command = 'cp %p /archive/wal/%f'
# Restore to a specific point in time
# 1. Stop PostgreSQL
# 2. Restore base backup
# 3. Create recovery.signal
# 4. Set in postgresql.auto.conf:
restore_command = 'cp /archive/wal/%f %p'
recovery_target_time = '2026-03-18 14:30:00 UTC'
recovery_target_action = 'promote'
# 5. Start PostgreSQL — it replays WAL up to the target time
6. Monitoring Queries¶
-- Active queries and locks
SELECT pid, age(clock_timestamp(), query_start) AS duration, usename,
query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE state != 'idle' AND pid != pg_backend_pid()
ORDER BY duration DESC;
-- Table sizes
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_table_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
-- Index usage (find unused indexes)
SELECT schemaname, tablename, indexname, idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;
-- Cache hit ratio (should be > 99%)
SELECT sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) AS cache_hit_ratio
FROM pg_statio_user_tables;
-- Connection count by state
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
-- Long-running transactions (can block autovacuum)
SELECT pid, age(clock_timestamp(), xact_start) AS xact_duration, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL AND state != 'idle'
ORDER BY xact_start;
-- Kill a runaway query
SELECT pg_cancel_backend(12345); -- graceful (SIGINT)
SELECT pg_terminate_backend(12345); -- forceful (SIGTERM)
7. Connection Pooling with PgBouncer¶
PostgreSQL forks a process per connection. At 500+ connections, this is expensive. PgBouncer sits in front and multiplexes.
# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction # transaction pooling (most common)
max_client_conn = 1000 # max client connections to pgbouncer
default_pool_size = 25 # connections per user/database pair to PostgreSQL
reserve_pool_size = 5 # extra connections for burst
server_idle_timeout = 300
Pool modes:
- session — connection held for entire client session (safest, least efficient)
- transaction — connection returned after each transaction (best for web apps)
- statement — connection returned after each statement (breaks multi-statement transactions)
Quick Reference¶
# Service management
sudo systemctl start/stop/restart/reload postgresql
# Check version
psql -U postgres -c "SELECT version();"
# Check config file location
psql -U postgres -c "SHOW config_file;"
# Check data directory
psql -U postgres -c "SHOW data_directory;"
# Database size
psql -U postgres -c "SELECT pg_size_pretty(pg_database_size('mydb'));"
# WAL location
psql -U postgres -c "SELECT pg_current_wal_lsn();"
# Replication lag in bytes
psql -U postgres -c "SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) FROM pg_stat_replication;"
Wiki Navigation¶
Related Content¶
- AWS Database Flashcards (CLI) (flashcard_deck, L1) — Database Operations
- Database Operations Flashcards (CLI) (flashcard_deck, L1) — Database Operations
- Database Operations on Kubernetes (Topic Pack, L2) — Database Operations
- Database Ops Drills (Drill, L2) — Database Operations
- Interview: Database Failover During Deploy (Scenario, L3) — Database Operations
- Redis Operations (Topic Pack, L2) — Database Operations
- Runbook: Long-Running Query / Lock Contention (Runbook, L2) — PostgreSQL Operations
- Runbook: PostgreSQL Connection Exhaustion (Runbook, L2) — PostgreSQL Operations
- Runbook: PostgreSQL Disk Space Critical (Runbook, L2) — PostgreSQL Operations
- Runbook: PostgreSQL Replication Lag (Runbook, L2) — PostgreSQL Operations
Pages that link here¶
- Anti-Primer: Postgresql
- Comparison: Relational Databases
- Database Operations Drills
- Postgresql
- Production Readiness Review: Answer Key
- Redis
- Runbook: Long-Running Query / Lock Contention
- Runbook: PostgreSQL Connection Exhaustion
- Runbook: PostgreSQL Disk Space Critical
- Runbook: PostgreSQL Replication Lag
- SQL Fundamentals
- Scenario: Database Failover During Deployment