devops
l2
topic-pack
postgresql
database-ops --- Portal | Level: L2: Operations | Topics: PostgreSQL Operations, Database Operations | Domain: DevOps & Tooling

PostgreSQL Operations - Primer¶

Why This Matters¶

PostgreSQL is the most important relational database for operations teams. It powers everything from small application backends to multi-terabyte analytics workloads. Unlike managed database services that hide operational complexity, running PostgreSQL means you are responsible for vacuuming, WAL management, replication lag, connection pooling, backup/restore, and performance tuning. A missed autovacuum cycle can cause transaction ID wraparound — a failure mode that forces the database into read-only mode to prevent data corruption. Understanding these operational concerns is not optional.

Core Concepts¶

1. Essential psql Commands¶

# Connect to a database
psql -h localhost -U postgres -d mydb

# Connect with connection string
psql "postgresql://user:pass@host:5432/mydb?sslmode=require"

# List databases, tables, indexes
\l          -- list databases
\dt         -- list tables in current schema
\dt+        -- with sizes
\di         -- list indexes
\dn         -- list schemas
\du         -- list roles/users
\d tablename  -- describe table structure
\x          -- toggle expanded output (useful for wide rows)

# Run a query from the command line
psql -U postgres -d mydb -c "SELECT count(*) FROM orders;"

# Execute a SQL file
psql -U postgres -d mydb -f schema.sql

# Export query to CSV
psql -U postgres -d mydb -c "COPY (SELECT * FROM orders) TO STDOUT WITH CSV HEADER" > orders.csv

2. Configuration Tuning¶

PostgreSQL ships with conservative defaults designed to run on minimal hardware. Production tuning is mandatory.

Key parameters in postgresql.conf:

# Memory — start here
shared_buffers = '4GB'            # 25% of RAM (default 128MB is too low)
effective_cache_size = '12GB'     # 75% of RAM (hint for planner, not allocation)
work_mem = '64MB'                 # per-sort/hash operation (careful: multiplied by connections * operations)
maintenance_work_mem = '1GB'      # for VACUUM, CREATE INDEX

# WAL and checkpoints
wal_buffers = '64MB'
checkpoint_completion_target = 0.9
max_wal_size = '4GB'
min_wal_size = '1GB'

# Connections
max_connections = 200             # keep low, use pgbouncer for pooling
superuser_reserved_connections = 3

# Query planner
random_page_cost = 1.1            # SSD storage (default 4.0 is for spinning disk)
effective_io_concurrency = 200    # SSD (default 1)

# Logging
log_min_duration_statement = 1000  # log queries slower than 1s (ms)
log_checkpoints = on
log_lock_waits = on
log_temp_files = 0                 # log all temp file usage

Apply changes:

# Most settings: reload (no downtime)
sudo systemctl reload postgresql
# OR from psql:
SELECT pg_reload_conf();

# Some settings require restart (shared_buffers, max_connections, etc.)
sudo systemctl restart postgresql

3. Vacuuming and Autovacuum¶

PostgreSQL uses MVCC (Multi-Version Concurrency Control). Updates and deletes do not remove old row versions — they create new versions. VACUUM reclaims dead rows.

-- Check dead tuples per table
SELECT schemaname, relname, n_dead_tup, n_live_tup,
       round(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 1) AS dead_pct,
       last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC LIMIT 20;

-- Manual vacuum (reclaims space for reuse, does NOT return to OS)
VACUUM VERBOSE orders;

-- Full vacuum (returns space to OS, requires exclusive lock — blocks all queries)
VACUUM FULL orders;  -- use only during maintenance windows

-- Check autovacuum is running
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE query LIKE 'autovacuum%';

Autovacuum tuning in postgresql.conf:

autovacuum = on
autovacuum_max_workers = 4
autovacuum_vacuum_scale_factor = 0.1    # vacuum when 10% of rows are dead
autovacuum_analyze_scale_factor = 0.05  # analyze when 5% of rows change
autovacuum_vacuum_cost_delay = 2ms      # less aggressive (default 2ms)

4. Replication Setup¶

Streaming replication (primary/replica):

On primary (postgresql.conf):

wal_level = replica
max_wal_senders = 5
wal_keep_size = '1GB'

On primary (pg_hba.conf):

host replication replicator 10.0.0.0/24 scram-sha-256

# Create replication user on primary
psql -U postgres -c "CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_pass';"

# Bootstrap replica from primary
pg_basebackup -h primary.example.com -U replicator -D /var/lib/postgresql/16/main \
  -Fp -Xs -P -R
# -R creates standby.signal and sets primary_conninfo in postgresql.auto.conf

# Start the replica
sudo systemctl start postgresql

Check replication status:

-- On primary: check connected replicas
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
       pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;

-- On replica: check replication status
SELECT status, received_lsn, latest_end_lsn,
       latest_end_time, slot_name
FROM pg_stat_wal_receiver;

5. Backup and Restore¶

# Logical backup (single database, portable across versions)
pg_dump -U postgres -Fc mydb > mydb.dump        # custom format (compressed)
pg_dump -U postgres -Fp mydb > mydb.sql          # plain SQL
pg_dumpall -U postgres > all_databases.sql       # all databases + roles

# Restore from custom format
pg_restore -U postgres -d mydb --clean --if-exists mydb.dump

# Restore from plain SQL
psql -U postgres -d mydb -f mydb.sql

# Point-in-time recovery (PITR) with WAL archiving
# postgresql.conf on primary:
archive_mode = on
archive_command = 'cp %p /archive/wal/%f'

# Restore to a specific point in time
# 1. Stop PostgreSQL
# 2. Restore base backup
# 3. Create recovery.signal
# 4. Set in postgresql.auto.conf:
restore_command = 'cp /archive/wal/%f %p'
recovery_target_time = '2026-03-18 14:30:00 UTC'
recovery_target_action = 'promote'
# 5. Start PostgreSQL — it replays WAL up to the target time

6. Monitoring Queries¶

-- Active queries and locks
SELECT pid, age(clock_timestamp(), query_start) AS duration, usename,
       query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE state != 'idle' AND pid != pg_backend_pid()
ORDER BY duration DESC;

-- Table sizes
SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
       pg_size_pretty(pg_table_size(schemaname || '.' || tablename)) AS table_size,
       pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

-- Index usage (find unused indexes)
SELECT schemaname, tablename, indexname, idx_scan,
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;

-- Cache hit ratio (should be > 99%)
SELECT sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0) AS cache_hit_ratio
FROM pg_statio_user_tables;

-- Connection count by state
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;

-- Long-running transactions (can block autovacuum)
SELECT pid, age(clock_timestamp(), xact_start) AS xact_duration, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL AND state != 'idle'
ORDER BY xact_start;

-- Kill a runaway query
SELECT pg_cancel_backend(12345);   -- graceful (SIGINT)
SELECT pg_terminate_backend(12345); -- forceful (SIGTERM)

7. Connection Pooling with PgBouncer¶

PostgreSQL forks a process per connection. At 500+ connections, this is expensive. PgBouncer sits in front and multiplexes.

# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction    # transaction pooling (most common)
max_client_conn = 1000     # max client connections to pgbouncer
default_pool_size = 25     # connections per user/database pair to PostgreSQL
reserve_pool_size = 5      # extra connections for burst
server_idle_timeout = 300

Pool modes: - session — connection held for entire client session (safest, least efficient) - transaction — connection returned after each transaction (best for web apps) - statement — connection returned after each statement (breaks multi-statement transactions)

Quick Reference¶

# Service management
sudo systemctl start/stop/restart/reload postgresql

# Check version
psql -U postgres -c "SELECT version();"

# Check config file location
psql -U postgres -c "SHOW config_file;"

# Check data directory
psql -U postgres -c "SHOW data_directory;"

# Database size
psql -U postgres -c "SELECT pg_size_pretty(pg_database_size('mydb'));"

# WAL location
psql -U postgres -c "SELECT pg_current_wal_lsn();"

# Replication lag in bytes
psql -U postgres -c "SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) FROM pg_stat_replication;"

AWS Database Flashcards (CLI) (flashcard_deck, L1) — Database Operations
Database Operations Flashcards (CLI) (flashcard_deck, L1) — Database Operations
Database Operations on Kubernetes (Topic Pack, L2) — Database Operations
Database Ops Drills (Drill, L2) — Database Operations
Interview: Database Failover During Deploy (Scenario, L3) — Database Operations
Redis Operations (Topic Pack, L2) — Database Operations
Runbook: Long-Running Query / Lock Contention (Runbook, L2) — PostgreSQL Operations
Runbook: PostgreSQL Connection Exhaustion (Runbook, L2) — PostgreSQL Operations
Runbook: PostgreSQL Disk Space Critical (Runbook, L2) — PostgreSQL Operations
Runbook: PostgreSQL Replication Lag (Runbook, L2) — PostgreSQL Operations

PostgreSQL Operations - Primer¶

Why This Matters¶

Core Concepts¶

1. Essential psql Commands¶

2. Configuration Tuning¶

3. Vacuuming and Autovacuum¶

4. Replication Setup¶

5. Backup and Restore¶

6. Monitoring Queries¶

7. Connection Pooling with PgBouncer¶

Quick Reference¶

Wiki Navigation¶

Pages that link here¶

PostgreSQL Operations - Primer¶

Why This Matters¶

Core Concepts¶

1. Essential psql Commands¶

2. Configuration Tuning¶

3. Vacuuming and Autovacuum¶

4. Replication Setup¶

5. Backup and Restore¶

6. Monitoring Queries¶

7. Connection Pooling with PgBouncer¶

Quick Reference¶

Wiki Navigation¶

Related Content¶

Pages that link here¶