Grading Checklist - GrokDevOps Wiki

Identifies the root cause: the conntrack table is full, causing the kernel to drop new connection tracking entries and therefore new NAT translations

Finds the nf_conntrack: table full messages in dmesg/syslog as the key diagnostic

Checks current conntrack count vs maximum (conntrack -C vs sysctl net.netfilter.nf_conntrack_max)

Proposes increasing nf_conntrack_max as an immediate fix with an appropriate value

Also increases the hash table size (nf_conntrack_buckets) proportionally

Reviews and reduces conntrack timeouts, especially nf_conntrack_tcp_timeout_established (default 5 days is often too long)

Analyzes the conntrack table to find heavy hitters (hosts or destinations with many entries)

Considers whether specific traffic should bypass conntrack with NOTRACK rules in the raw table

Mentions that a single external IP limits to ~65535 simultaneous connections per destination

Suggests monitoring conntrack usage as a standard metric going forward

Does NOT suggest simply restarting the gateway as a recurring fix

Considers adding more external IPs to the NAT pool for long-term capacity

Grading Checklist¶