NAT Footguns¶

Mistakes that break connectivity, exhaust resources, or hide the real source of traffic.

1. Using MASQUERADE instead of SNAT on static IPs¶

MASQUERADE looks up the outgoing interface IP on every packet. If your public IP never changes, this is wasted work. On high-traffic hosts handling thousands of packets per second, the overhead is measurable.

Fix: Use SNAT --to-source <ip> when the public IP is static. Reserve MASQUERADE for interfaces with DHCP-assigned addresses.

2. Forgetting to enable ip_forward¶

You configure DNAT rules to forward traffic to a backend. Packets arrive at the NAT host but never reach the backend. The kernel silently discards forwarded packets because net.ipv4.ip_forward defaults to 0.

Fix: Enable forwarding: sysctl -w net.ipv4.ip_forward=1. Persist in /etc/sysctl.d/. Also add FORWARD chain rules to allow the traffic.

3. Not adding FORWARD rules alongside DNAT¶

You add a PREROUTING DNAT rule but forget to allow the traffic in the FORWARD chain. The destination is rewritten but the packet is dropped by the FORWARD chain's DROP policy. No errors — packets silently vanish.

Fix: For every DNAT rule, add a matching FORWARD ACCEPT: iptables -A FORWARD -p tcp -d 10.0.0.5 --dport 80 -j ACCEPT.

4. Conntrack table exhaustion on load balancers¶

Every NAT'd connection needs a conntrack entry. The default max is 65,536. A busy proxy or load balancer handling thousands of connections per second fills this within minutes. New connections are silently dropped. dmesg shows the error but nobody is watching.

Fix: Pre-size for your workload: sysctl -w net.netfilter.nf_conntrack_max=262144. Monitor nf_conntrack_count in your metrics. Reduce timeouts for TIME_WAIT and ESTABLISHED states.

War story: A Kubernetes cluster experienced intermittent timeouts, NodePort 503 errors, and sporadic pod-to-pod drops. The kernel logged nf_conntrack: table full, dropping packet but nobody was watching dmesg. The default nf_conntrack_max of 65,536 was exhausted by pod-to-service NAT traffic. The fix: sysctl -w net.netfilter.nf_conntrack_max=1048576 and monitoring nf_conntrack_count in Prometheus.

5. Losing the real client IP through NAT¶

Full NAT (SNAT + DNAT) rewrites both source and destination. Backend servers see the load balancer's IP as the client. Access logs, rate limiting, and geo-blocking all break because every request appears to come from the same IP.

Fix: Use PROXY protocol, X-Forwarded-For headers, or Direct Server Return (DSR). Configure your backend to read the real client IP from these sources.

6. Source port exhaustion with a single public IP¶

Each NAT mapping needs a unique source port. With one public IP, you have roughly 64,000 ports. If your NAT gateway handles connections to many different backends, port space depletes. New outbound connections fail with EAGAIN or silent drops.

Fix: Add more public IPs: --to-source 203.0.113.1-203.0.113.4. Monitor conntrack -C and alert when approaching capacity.

7. Stale conntrack entries after changing NAT rules¶

You change a DNAT rule to point to a new backend. Existing connections continue going to the old backend because conntrack caches the old translation. The change appears to have no effect until connections time out.

Fix: After changing NAT rules, flush relevant conntrack entries: conntrack -D -d <old-backend>. Or flush all: conntrack -F (causes brief disruption to all NAT'd connections).

Debug clue: conntrack -L -d <old-backend-ip> shows stale entries still pointing to the old backend. If you see entries with the old destination after changing rules, that's your smoking gun. New connections will use the new rule; existing connections are frozen in conntrack until they expire or are flushed.

8. NAT loopback (hairpinning) not configured¶

Internal clients try to reach a public service by its external IP. The DNAT rule rewrites the destination, but the backend's reply goes directly to the internal client (not through the NAT). The client's TCP stack rejects the reply because the source IP is wrong.

Fix: Add an SNAT rule for internal-to-internal traffic through the NAT: iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -d 10.0.0.5 -p tcp --dport 80 -j MASQUERADE. Or configure split DNS so internal clients use the private IP directly.

9. Running conntrack -F on a busy production host¶

You flush the entire conntrack table to fix a NAT issue. Every tracked connection is destroyed instantly. All NAT'd TCP connections drop — every active user sees a reset. On a load balancer, this affects thousands of users simultaneously.

Fix: Flush selectively: conntrack -D -s <specific-ip> or conntrack -D -d <specific-ip> --dport <port>. Only flush the entries related to the issue.

10. Docker and manual NAT rules conflicting¶

You add manual iptables NAT rules on a Docker host. Docker also manages the nat table. On daemon restart or container re-creation, Docker rebuilds its rules and your manual rules may end up in the wrong order or be overridden entirely.

Fix: Use Docker's built-in port publishing (-p) for container NAT. If you need custom NAT alongside Docker, use the DOCKER-USER chain which Docker preserves across restarts.

Under the hood: Docker inserts its NAT rules at the top of the PREROUTING and POSTROUTING chains on daemon start and container creation. Your manually-added rules may end up after Docker's rules, never matching. The DOCKER-USER chain is evaluated before Docker's own DOCKER chain and is never flushed by the daemon — it exists specifically for user-defined rules.