- networking
- l1
- topic-pack
- nat --- Portal | Level: L1: Foundations | Topics: NAT | Domain: Networking
NAT (Network Address Translation) - Primer¶
Why This Matters¶
NAT is everywhere — from your home router to Kubernetes service networking. It lets private networks share public IPs, enables load balancing, and powers container networking. When NAT breaks, connections drop mysteriously, logs show wrong source IPs, and conntrack tables overflow under load. Understanding NAT internals is essential for debugging production networking issues.
Name origin: NAT was originally a stopgap invented in the 1990s to conserve IPv4 addresses (RFC 1631, 1994). It was supposed to be temporary until IPv6 adoption. Three decades later, NAT is still fundamental infrastructure.
Fun fact: Your home router performs NAT for every outbound connection. A typical household generates thousands of conntrack entries for streaming, browsing, and IoT devices — all mapped through a single public IP.
NAT Types¶
SNAT (Source NAT)¶
Changes the source IP of outgoing packets. Used when private hosts need internet access through a shared public IP.
# Static SNAT — map internal host to specific public IP
iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o eth0 -j SNAT --to-source 203.0.113.1
# Masquerade — dynamic SNAT using the outgoing interface IP
# Use when the public IP is assigned via DHCP
iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o eth0 -j MASQUERADE
SNAT is more efficient than MASQUERADE because it does not need to look up the interface IP on every packet. Use MASQUERADE only when the IP changes.
Gotcha: MASQUERADE also flushes conntrack entries when the interface goes down, which can be desirable for DHCP-assigned IPs but destructive for stable servers. On a server with a static IP, always use SNAT.
DNAT (Destination NAT)¶
Changes the destination IP of incoming packets. Used for port forwarding and load balancing.
Remember: NAT direction mnemonic: "SNAT = Source = Sending out" (postrouting, outbound), "DNAT = Destination = Delivering in" (prerouting, inbound). SNAT happens in POSTROUTING (after the routing decision), DNAT happens in PREROUTING (before the routing decision). Getting the chain wrong is the #1 NAT rule mistake.
# Forward port 8080 on public IP to internal web server
iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 8080 \
-j DNAT --to-destination 10.0.0.5:80
# Must also allow the forwarded traffic
iptables -A FORWARD -p tcp -d 10.0.0.5 --dport 80 -j ACCEPT
# Enable IP forwarding
sysctl -w net.ipv4.ip_forward=1
Full NAT (SNAT + DNAT)¶
Used in load balancers where both source and destination are rewritten. The backend sees the load balancer's IP as the client, losing the real client IP (use X-Forwarded-For or PROXY protocol to preserve it).
Gotcha: Losing the real client IP breaks rate limiting, geo-routing, and audit logging. Always configure your load balancer to inject X-Forwarded-For (HTTP) or use the PROXY protocol (TCP). In Kubernetes,
externalTrafficPolicy: Localon a Service preserves the client IP by skipping the extra SNAT hop.
Connection Tracking (conntrack)¶
The kernel tracks every NAT'd connection in the conntrack table. This is how it knows to reverse the translation on return packets.
# View active connections
conntrack -L
conntrack -L -p tcp --dport 80
# Count entries
conntrack -C
# Check table size
sysctl net.netfilter.nf_conntrack_max
# Increase if needed (default is often 65536)
sysctl -w net.netfilter.nf_conntrack_max=262144
# Check current usage
cat /proc/sys/net/netfilter/nf_conntrack_count
Conntrack States¶
| State | Meaning |
|---|---|
| NEW | First packet of a connection |
| ESTABLISHED | Packets in both directions seen |
| RELATED | Related to an existing connection (e.g., FTP data) |
| INVALID | Does not match any known connection |
| TIME_WAIT | Connection closed, waiting for stale packets |
Under the hood: Conntrack uses a hash table indexed by the 5-tuple (src IP, dst IP, src port, dst port, protocol). Each entry consumes ~300 bytes of kernel memory. At 262,144 entries (a common production tuning), that is ~75 MB of unswappable kernel memory. The
nf_conntrack_bucketssysctl controls the hash table size — set it tonf_conntrack_max / 4for optimal lookup performance.
NAT Exhaustion¶
Port Exhaustion¶
Each NAT mapping uses a source port. With one public IP, you have ~64,000 ports (1024-65535 ephemeral range). High-traffic hosts (proxies, API gateways) can exhaust this.
Remember: The conntrack table size formula: nf_conntrack_max = nf_conntrack_buckets x 4 (by default). The default max is often 65536, which is dangerously low for proxies, load balancers, or Kubernetes nodes. A single busy node can easily have 100K+ active connections. Monitor with
conntrack -Cand alert at 80% capacity.
Symptoms:
- nf_conntrack: table full, dropping packet in dmesg
- New connections fail while existing ones work
- conntrack -C near nf_conntrack_max
Fixes:
# Increase conntrack table
sysctl -w net.netfilter.nf_conntrack_max=524288
sysctl -w net.netfilter.nf_conntrack_buckets=131072
# Reduce timeouts for finished connections
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=600
# Add more public IPs for SNAT
iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o eth0 \
-j SNAT --to-source 203.0.113.1-203.0.113.4
NAT in Container Networking¶
Docker¶
Docker uses MASQUERADE for container-to-external traffic and DNAT for published ports:
# See Docker's NAT rules
iptables -t nat -L -n -v
# Docker creates rules like:
# MASQUERADE tcp 172.17.0.0/16 -> anywhere (outbound)
# DNAT tcp anywhere:8080 -> 172.17.0.2:80 (published port)
Kubernetes¶
kube-proxy implements Service ClusterIPs using DNAT (iptables mode) or IPVS. NodePort services add DNAT rules on every node.
Gotcha: In large Kubernetes clusters, kube-proxy in iptables mode creates O(n) rules per service endpoint. A cluster with 5,000 services and 10 endpoints each generates ~50,000 iptables rules, causing noticeable latency on rule updates and connection establishment. This is why IPVS mode (O(1) lookup via hash table) is preferred at scale. Switch with
--proxy-mode=ipvson kube-proxy.Timeline: NAT's evolution: RFC 1631 (1994, original NAT spec) -> RFC 3022 (2001, "Traditional IP NAT" replacing 1631) -> RFC 5382 (2008, NAT behavioral requirements for TCP) -> RFC 6146 (2011, NAT64 for IPv6 transition). The Linux netfilter conntrack system that powers modern NAT was written by Rusty Russell and integrated into Linux 2.4 (2001).
nftables (Modern Replacement)¶
# Equivalent SNAT with nftables
nft add table nat
nft add chain nat postrouting { type nat hook postrouting priority 100 \; }
nft add rule nat postrouting oifname "eth0" masquerade
Debugging NAT Issues¶
# Trace NAT translations in real-time
conntrack -E
# Check for dropped packets due to full table
dmesg | grep conntrack
# Verify NAT rules are matching
iptables -t nat -L -n -v --line-numbers
# Check specific connection
conntrack -L -s 10.0.0.5 -p tcp --dport 80
# Monitor conntrack usage
watch -n1 'cat /proc/sys/net/netfilter/nf_conntrack_count'
Quick Reference¶
| Task | Command |
|---|---|
| List NAT rules | iptables -t nat -L -n -v |
| Add SNAT | iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -o eth0 -j MASQUERADE |
| Add DNAT | iptables -t nat -A PREROUTING -p tcp --dport 8080 -j DNAT --to 10.0.0.5:80 |
| View conntrack | conntrack -L |
| Count connections | conntrack -C |
| Check table limit | sysctl net.netfilter.nf_conntrack_max |
| Flush conntrack | conntrack -F |
| Enable forwarding | sysctl -w net.ipv4.ip_forward=1 |
Wiki Navigation¶
Related Content¶
- NAT Flashcards (CLI) (flashcard_deck, L1) — NAT