Skip to content

MTU (Maximum Transmission Unit) - Primer

Why This Matters

MTU mismatches cause some of the most baffling network issues: small pings work but large transfers hang, SSH connects but SCP stalls, web pages partially load. These problems are invisible to basic connectivity tests and can persist for weeks before anyone connects the symptoms to an MTU issue.

Fundamentals

MTU is the largest packet size (in bytes) that a network interface will transmit without fragmentation.

Name origin: The 1500-byte Ethernet MTU dates to the original DIX Ethernet specification (1980). It was a compromise between efficiency (larger frames = less overhead) and the cost of buffer memory on early 1980s network interface cards. The 1500-byte standard has survived four decades despite dramatic drops in memory costs.

Network Type Typical MTU
Ethernet 1500 bytes
Jumbo frames 9000 bytes
PPPoE (DSL) 1492 bytes
VPN tunnels (IPsec) 1400-1436 bytes
VXLAN overlay 1450 bytes
GRE tunnel 1476 bytes

The IP header is 20 bytes and TCP header is 20 bytes, so with 1500 MTU the maximum TCP payload (MSS) is 1460 bytes.

Remember: The "MTU math" shortcut: MSS = MTU - 40 (for IPv4 without options). For IPv6: MSS = MTU - 60 (IPv6 header is 40 bytes). For tunnels, subtract the overlay header first, then subtract 40. Example: VXLAN on 1500 underlay = 1450 MTU = 1410 MSS.

Under the hood: The full Ethernet frame is actually 1518 bytes: 14-byte Ethernet header (6B dst MAC + 6B src MAC + 2B EtherType) + 1500-byte payload + 4-byte FCS (Frame Check Sequence). With an 802.1Q VLAN tag, the frame grows to 1522 bytes. The "1500 MTU" refers only to the Layer 3 payload.

Path MTU Discovery (PMTUD)

PMTUD lets endpoints discover the smallest MTU along a path without manual configuration.

  1. Sender sends packets with the Don't Fragment (DF) bit set
  2. If a router cannot forward (packet > link MTU), it drops the packet and sends back an ICMP "Fragmentation Needed" message
  3. Sender reduces packet size and retries

PMTUD Blackholes

PMTUD breaks when ICMP is blocked by firewalls. The sender never gets the "too big" message and keeps retrying with oversized packets. This is an MTU blackhole.

Gotcha: Many firewall administrators block all ICMP "for security." This breaks PMTUD and creates MTU blackholes that are extremely hard to diagnose. The correct practice is to allow ICMP Type 3 (Destination Unreachable), especially Code 4 (Fragmentation Needed). Blocking this specific ICMP type causes real outages; allowing it is not a security risk.

Symptoms: - Small packets work (ping, DNS, SSH login) - Large transfers hang (SCP, HTTP downloads, database queries) - TCP connections established but data transfer stalls

Fragmentation

Debug clue: The classic MTU blackhole pattern: ping -s 56 host works, but ping -M do -s 1472 host fails with "message too long" or simply times out. If the small ping works and the large one silently hangs (no ICMP error returned), a firewall between you and the target is dropping ICMP Type 3 Code 4 messages. Use tracepath to find the hop where the MTU drops.

When DF is not set, oversized packets get fragmented at routers. This is bad:

  • Fragments must all arrive for reassembly — one lost fragment means retransmit everything
  • Fragments increase CPU load on routers
  • Stateful firewalls may not track fragments properly
  • Fragment reassembly attacks are a DoS vector
# Check fragmentation stats
cat /proc/net/snmp | grep -i frag
netstat -s | grep -i frag

Jumbo Frames

Jumbo frames use MTU 9000 (or up to 9216). Benefits: fewer packets, less CPU overhead, higher throughput for bulk transfers.

Fun fact: The term "jumbo frames" was never formally standardized by IEEE. The 9000-byte size became a de facto standard because early Alteon Networks switches supported it. Some vendors support up to 9216 bytes (9000 payload + Ethernet overhead). The name stuck because 9000 bytes felt "jumbo" compared to 1500.

Gotcha: AWS EC2 instances support jumbo frames (MTU 9001) within the same VPC, but traffic crossing a VPC peering connection, VPN, or internet gateway is clamped to 1500. Forgetting this causes silent packet drops on cross-VPC bulk transfers.

Requirements: - Every device in the path must support jumbo frames: NICs, switches, routers - One device at 1500 in a jumbo frame path silently drops oversized frames - Typically used only within datacenters on dedicated storage/cluster networks

# Set jumbo frames
ip link set dev eth0 mtu 9000

# Persistent (RHEL/CentOS)
# Add MTU=9000 to /etc/sysconfig/network-scripts/ifcfg-eth0

# Persistent (netplan)
# network:
#   ethernets:
#     eth0:
#       mtu: 9000

Debugging MTU Issues

Test with ping

# Test specific packet size (1472 + 28 bytes IP/ICMP header = 1500)
ping -M do -s 1472 10.0.0.1          # Linux (-M do = don't fragment)
ping -D -s 1472 10.0.0.1             # macOS (-D = don't fragment)

# Binary search for path MTU
ping -M do -s 1400 10.0.0.1          # works? try higher
ping -M do -s 1450 10.0.0.1          # works? try higher
ping -M do -s 1472 10.0.0.1          # fails? MTU is between 1450-1472

Check interface MTU

ip link show eth0 | grep mtu
cat /sys/class/net/eth0/mtu

tracepath (discovers path MTU)

tracepath 10.0.0.1
# Shows MTU at each hop and the overall path MTU

tcpdump for fragmentation

# Look for fragmented packets
tcpdump -i eth0 'ip[6:2] & 0x3fff != 0'

# Look for ICMP "need to frag" messages
tcpdump -i eth0 'icmp[0] == 3 and icmp[1] == 4'

MTU in Overlay Networks

Overlay networks (VXLAN, GRE, IPsec) add headers, reducing effective MTU:

Overlay Header Overhead Effective MTU (on 1500 underlay)
VXLAN 50 bytes 1450
GRE 24 bytes 1476
IPsec (tunnel) 52-73 bytes 1427-1448
WireGuard 60 bytes 1440

Kubernetes/Container MTU

CNI plugins must set pod MTU correctly. If the underlay is 1500 and the overlay adds 50 bytes, pod MTU should be 1450. Misconfigured MTU causes:

War story: A common Kubernetes outage pattern: new cluster, Calico CNI with VXLAN, pods on the same node communicate fine but cross-node large HTTP responses hang. Cause: Calico defaults to MTU 1440 but the admin manually set pod MTU to 1500, not accounting for the 50-byte VXLAN overhead. Small DNS and health check packets work fine. Large API responses silently drop. Fix: set the CNI MTU to underlay_MTU - overlay_overhead. - Pod-to-pod large transfers fail - DNS works but HTTP hangs - Intermittent timeouts on services

# Check pod interface MTU
kubectl exec -it <pod> -- ip link show eth0

TCP MSS Clamping

Analogy: MSS clamping is like putting a "max height" sign on a tunnel entrance. Instead of letting trucks (large packets) drive in and get stuck, you tell them their maximum size during the TCP handshake (SYN packet). The sender then voluntarily keeps all its segments under that limit. This works even when PMTUD is broken because it operates at connection setup time, not during data transfer.

When you cannot fix MTU everywhere, clamp the TCP MSS to force smaller segments:

# Clamp MSS to match a 1400 MTU
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
  -j TCPMSS --set-mss 1360

# Or auto-clamp to PMTU
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
  -j TCPMSS --clamp-mss-to-pmtu

Quick Reference

Task Command
Show interface MTU ip link show eth0
Set MTU ip link set dev eth0 mtu 9000
Test path MTU ping -M do -s 1472 <host>
Discover path MTU tracepath <host>
Find fragmented packets tcpdump -i eth0 'ip[6:2] & 0x3fff != 0'
Clamp TCP MSS iptables -t mangle ... -j TCPMSS --clamp-mss-to-pmtu
Check frag stats netstat -s \| grep -i frag

Wiki Navigation