Solution: NAT Port Exhaustion / Intermittent Failures¶

Summary¶

The Linux NAT gateway's nf_conntrack table has reached its maximum capacity (default 65536 entries). When the table is full, the kernel cannot create new connection tracking entries, causing new outbound connections to be dropped. This manifests as intermittent failures because existing connections continue to work (they already have conntrack entries), but new connections fail randomly.

Senior Workflow¶

Step 1: Confirm conntrack table saturation¶

# Check for table full messages
dmesg | grep conntrack
# Expected: "nf_conntrack: table full, dropping packet."

# Check current count vs max
sysctl net.netfilter.nf_conntrack_count
sysctl net.netfilter.nf_conntrack_max
# Expected: count ≈ max (e.g., 65536/65536)

# Or use conntrack tool:
conntrack -C    # current count
conntrack -S    # stats including drops

Step 2: Analyze what is consuming the table¶

# Count entries per internal source IP (top talkers)
conntrack -L | awk '{print $4}' | sort | uniq -c | sort -rn | head -10

# Count entries per destination
conntrack -L | awk '{print $5}' | sort | uniq -c | sort -rn | head -10

# Count by protocol state
conntrack -L | awk '{print $4}' | grep -oP 'state=\S+' | sort | uniq -c | sort -rn

Look for: hosts with thousands of entries, TIME_WAIT entries piling up, or unexpected long-lived connections.

Step 3: Check current timeout settings¶

sysctl net.netfilter.nf_conntrack_tcp_timeout_established
# Default: 432000 (5 days!) -- far too long for most NAT scenarios

sysctl net.netfilter.nf_conntrack_tcp_timeout_time_wait
# Default: 120

sysctl net.netfilter.nf_conntrack_tcp_timeout_close_wait
# Default: 60

Step 4: Apply immediate fixes¶

# Increase the conntrack table size (immediate relief)
sysctl -w net.netfilter.nf_conntrack_max=262144

# Increase hash buckets proportionally (max/4 is a good ratio)
echo 65536 > /sys/module/nf_conntrack/parameters/hashsize

# Reduce the established connection timeout (5 days -> 1 hour)
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=3600

# Reduce TIME_WAIT timeout
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30

Step 5: Make changes persistent¶

# /etc/sysctl.d/99-conntrack.conf
cat <<EOF > /etc/sysctl.d/99-conntrack.conf
net.netfilter.nf_conntrack_max = 262144
net.netfilter.nf_conntrack_tcp_timeout_established = 3600
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30
EOF

sysctl --system

Step 6: Add NOTRACK for high-volume exempt traffic (optional)¶

# If monitoring or backup traffic floods conntrack, bypass it:
iptables -t raw -A PREROUTING -s 10.200.50.0/24 -d 10.200.60.0/24 -j NOTRACK
iptables -t raw -A OUTPUT -s 10.200.60.0/24 -d 10.200.50.0/24 -j NOTRACK

Step 7: Set up monitoring¶

# Add to Prometheus/collectd/monitoring:
# Metric: net.netfilter.nf_conntrack_count / nf_conntrack_max
# Alert at 80% utilization

Step 8: Verify¶

# Monitor conntrack usage after fix
watch 'sysctl net.netfilter.nf_conntrack_count; sysctl net.netfilter.nf_conntrack_max'

# Verify no more "table full" messages
dmesg -T | tail -20

Common Pitfalls¶

Rebooting as a "fix": This flushes conntrack but the table fills up again. Address the root cause.
Only increasing nf_conntrack_max: Without reducing timeouts, you're just buying time. A 5-day TCP established timeout means dead connections linger for days.
Not increasing hashsize: A larger table with a small hash leads to long chains and CPU overhead. Keep hashsize = max/4.
Ignoring heavy hitters: One misbehaving host (crawler, monitoring tool) can consume thousands of entries. Find and fix the source.
Single external IP limitation: Even with a large conntrack table, a single external IP can only have ~65535 simultaneous connections to one destination. Use a NAT pool for scale.
Not monitoring conntrack: This metric should be on every NAT gateway dashboard. Alert before it fills up.