MTU Footguns¶
Mistakes that cause silent packet drops, intermittent stalls, and hard-to-diagnose transfer failures.
1. Blocking ICMP "need to frag" messages at the firewall¶
Your firewall drops all ICMP. Path MTU Discovery depends on ICMP type 3, code 4 ("Fragmentation Needed"). Without it, senders never learn that packets are too large. Large transfers silently stall while small packets (pings, DNS) work fine. This is a PMTUD blackhole.
Fix: Never blanket-block ICMP. At minimum, allow ICMP type 3 (destination unreachable) in both directions. This is the single most common cause of mysterious transfer failures.
War story: A well-documented AWS incident involved a Spring web service timing out when connecting to an external SOAP service via IPSec tunnel. Small requests (DNS, pings) worked; large HTTP responses stalled. The root cause: ICMP was blocked in both NACLs and Security Groups. RFC 2923 documents this class of failure — PMTUD blackholes have been a known problem since 2000, yet teams keep blocking ICMP "for security."
2. Enabling jumbo frames on only part of the path¶
You set MTU 9000 on two servers but forget the switch between them. The switch silently drops frames larger than 1500 bytes. Small traffic works, large transfers fail or run at a fraction of expected speed. No errors appear in the server logs.
Fix: Jumbo frames must be enabled on every device in the path: NICs, switches, routers. Verify end-to-end with ping -M do -s 8972. One device at 1500 breaks the entire chain.
3. Forgetting overlay overhead in container networks¶
Your underlay is 1500 MTU. VXLAN adds 50 bytes of header. Pod MTU is left at the default 1500. Pods send 1500-byte packets that become 1550 bytes on the wire — too large for the underlay. DNS works (small packets) but HTTP responses with real data stall.
Fix: Set pod/container MTU to underlay MTU minus overlay overhead. For VXLAN: 1450. For GRE: 1476. For WireGuard: 1440. Configure this in your CNI plugin settings.
Remember: Overlay overhead cheat sheet: VXLAN = 50 bytes (8 UDP + 8 VXLAN + 14 inner Ethernet + 20 outer IP), GRE = 24 bytes, WireGuard = 60 bytes, Geneve = 50+ bytes (variable). If your underlay is 1500, your pod MTU must be
1500 - overhead.
4. Testing MTU with ping but forgetting the header overhead¶
You run ping -M do -s 1500 host. This sends a 1528-byte packet (1500 + 20 IP + 8 ICMP). It fails, and you incorrectly conclude the path MTU is below 1500. The path MTU is actually 1500 — your test was sending packets 28 bytes too large.
Fix: For a 1500 MTU path, test with ping -M do -s 1472 (1472 payload + 28 header = 1500 total). Always subtract 28 from the MTU to get the correct ping size.
5. Changing MTU on a live production interface¶
You run ip link set eth0 mtu 9000 on a busy server. All active TCP connections had their MSS negotiated at the old MTU. Existing connections are not affected, but new connections may now send packets that intermediate devices cannot handle. If the change is wrong, you learn by having an outage.
Fix: Test MTU changes during maintenance windows. Verify the new MTU works end-to-end with ping -M do before applying. Keep a rollback command ready: ip link set eth0 mtu 1500.
6. Mismatched MTU on bond members¶
You set MTU 9000 on bond0 but one member NIC only supports 1500. The bond silently uses the lowest common MTU. Large frames sent through the limited member are dropped. Traffic through the other member works fine — intermittent failures that are maddening to debug.
Fix: Verify all bond members support the desired MTU before setting it on the bond. Check with ethtool -i ethX for driver/firmware capabilities.
7. Not persisting MTU changes¶
You fix an MTU issue with ip link set dev eth0 mtu 1450. The host reboots. MTU reverts to the default. The problem returns. You fix it again. It reboots again. This cycle repeats until someone adds it to the persistent config.
Fix: Persist in netplan, NetworkManager, or systemd-networkd. For nmcli: nmcli con mod eth0 802-3-ethernet.mtu 1450. Verify after reboot.
8. Setting MSS clamp too low¶
You clamp TCP MSS to 1200 "to be safe." Every TCP connection now uses 1200-byte segments instead of 1460. Each packet carries less data but the same overhead. Throughput drops 18% compared to correct sizing. On high-bandwidth paths, this waste is significant.
Fix: Calculate the correct MSS from the path MTU: MSS = MTU - 40 (20 IP + 20 TCP). Do not over-compensate. Use --clamp-mss-to-pmtu to let the kernel calculate automatically.
9. Assuming cloud instances have standard 1500 MTU¶
AWS instances within the same VPC can use jumbo frames (MTU 9001). Cross-VPC or internet-bound traffic drops to 1500. If your application sends jumbo frames to the internet without PMTUD working, packets are silently dropped.
Fix: Check your cloud provider's MTU documentation. AWS: 9001 intra-VPC, 1500 cross-VPC. GCP: 1460 default. Set interface MTU appropriately or ensure PMTUD works.
Gotcha: AWS uses MTU 9001 (not 9000) intra-VPC. GCP defaults to 1460 because it reserves 40 bytes for internal encapsulation. Azure defaults to 1500. Cross-provider VPN tunnels compound the problem — each hop may subtract different overhead.
10. Ignoring MTU on loopback and tunnel interfaces¶
You troubleshoot a local service that communicates over a tunnel interface. The tunnel MTU was auto-configured to 1480 but the application sends 1500-byte messages. Because it is local, you never think to check the tunnel MTU. Packets are silently fragmented or dropped depending on DF bit.
Fix: Check MTU on all interfaces in the path, including tunnels and virtual interfaces: ip link show | grep mtu. Do not assume non-physical interfaces have standard MTU.