Skip to content

Progressive Hints

Hint 1 (after 5 min)

The replica (postgresql-1) shows Ready: False but Status: Running. This means the container is alive but failing its readiness probe. The replication state metric shows startup not streaming — the replica is trying to connect to the primary but cannot establish a streaming connection.

Hint 2 (after 10 min)

The PostgreSQL log from postgresql-1 says "could not connect to the primary server: connection refused." But pg_up shows both instances are up. So the primary is running and accepting exporter connections, but the replica cannot reach it on port 5432 via the headless service DNS name. Now look at the Terraform network ACL rule — it only allows inbound traffic from 10.0.10.0/24 (subnet A), but the replica may be in subnet B (10.0.11.0/24).

Hint 3 (after 15 min)

This is a PostgreSQL primary-replica pair running as a StatefulSet in Kubernetes, backed by AWS subnets in two availability zones. The primary is in subnet A (10.0.10.0/24), the replica is in subnet B (10.0.11.0/24). The Network ACL rule only permits inbound PostgreSQL traffic from 10.0.10.0/24 — it is missing a rule for 10.0.11.0/24. The replica has been unable to stream WAL from the primary since it was deployed, accumulating nearly 10 days of lag. The checkpoint log from the primary is normal background activity.