networking
l1
topic-pack
mtu --- Portal | Level: L1: Foundations | Topics: MTU | Domain: Networking

MTU (Maximum Transmission Unit) - Primer¶

Why This Matters¶

MTU mismatches cause some of the most baffling network issues: small pings work but large transfers hang, SSH connects but SCP stalls, web pages partially load. These problems are invisible to basic connectivity tests and can persist for weeks before anyone connects the symptoms to an MTU issue.

Fundamentals¶

MTU is the largest packet size (in bytes) that a network interface will transmit without fragmentation.

Name origin: The 1500-byte Ethernet MTU dates to the original DIX Ethernet specification (1980). It was a compromise between efficiency (larger frames = less overhead) and the cost of buffer memory on early 1980s network interface cards. The 1500-byte standard has survived four decades despite dramatic drops in memory costs.

Network Type	Typical MTU
Ethernet	1500 bytes
Jumbo frames	9000 bytes
PPPoE (DSL)	1492 bytes
VPN tunnels (IPsec)	1400-1436 bytes
VXLAN overlay	1450 bytes
GRE tunnel	1476 bytes

The IP header is 20 bytes and TCP header is 20 bytes, so with 1500 MTU the maximum TCP payload (MSS) is 1460 bytes.

Remember: The "MTU math" shortcut: MSS = MTU - 40 (for IPv4 without options). For IPv6: MSS = MTU - 60 (IPv6 header is 40 bytes). For tunnels, subtract the overlay header first, then subtract 40. Example: VXLAN on 1500 underlay = 1450 MTU = 1410 MSS.

Under the hood: The full Ethernet frame is actually 1518 bytes: 14-byte Ethernet header (6B dst MAC + 6B src MAC + 2B EtherType) + 1500-byte payload + 4-byte FCS (Frame Check Sequence). With an 802.1Q VLAN tag, the frame grows to 1522 bytes. The "1500 MTU" refers only to the Layer 3 payload.

Path MTU Discovery (PMTUD)¶

PMTUD lets endpoints discover the smallest MTU along a path without manual configuration.

Sender sends packets with the Don't Fragment (DF) bit set
If a router cannot forward (packet > link MTU), it drops the packet and sends back an ICMP "Fragmentation Needed" message
Sender reduces packet size and retries

PMTUD Blackholes¶

PMTUD breaks when ICMP is blocked by firewalls. The sender never gets the "too big" message and keeps retrying with oversized packets. This is an MTU blackhole.

Gotcha: Many firewall administrators block all ICMP "for security." This breaks PMTUD and creates MTU blackholes that are extremely hard to diagnose. The correct practice is to allow ICMP Type 3 (Destination Unreachable), especially Code 4 (Fragmentation Needed). Blocking this specific ICMP type causes real outages; allowing it is not a security risk.

Symptoms: - Small packets work (ping, DNS, SSH login) - Large transfers hang (SCP, HTTP downloads, database queries) - TCP connections established but data transfer stalls

Fragmentation¶

Debug clue: The classic MTU blackhole pattern: ping -s 56 host works, but ping -M do -s 1472 host fails with "message too long" or simply times out. If the small ping works and the large one silently hangs (no ICMP error returned), a firewall between you and the target is dropping ICMP Type 3 Code 4 messages. Use tracepath to find the hop where the MTU drops.

When DF is not set, oversized packets get fragmented at routers. This is bad:

Fragments must all arrive for reassembly — one lost fragment means retransmit everything
Fragments increase CPU load on routers
Stateful firewalls may not track fragments properly
Fragment reassembly attacks are a DoS vector

# Check fragmentation stats
cat /proc/net/snmp | grep -i frag
netstat -s | grep -i frag

Jumbo Frames¶

Jumbo frames use MTU 9000 (or up to 9216). Benefits: fewer packets, less CPU overhead, higher throughput for bulk transfers.

Fun fact: The term "jumbo frames" was never formally standardized by IEEE. The 9000-byte size became a de facto standard because early Alteon Networks switches supported it. Some vendors support up to 9216 bytes (9000 payload + Ethernet overhead). The name stuck because 9000 bytes felt "jumbo" compared to 1500.

Gotcha: AWS EC2 instances support jumbo frames (MTU 9001) within the same VPC, but traffic crossing a VPC peering connection, VPN, or internet gateway is clamped to 1500. Forgetting this causes silent packet drops on cross-VPC bulk transfers.

Requirements: - Every device in the path must support jumbo frames: NICs, switches, routers - One device at 1500 in a jumbo frame path silently drops oversized frames - Typically used only within datacenters on dedicated storage/cluster networks

# Set jumbo frames
ip link set dev eth0 mtu 9000

# Persistent (RHEL/CentOS)
# Add MTU=9000 to /etc/sysconfig/network-scripts/ifcfg-eth0

# Persistent (netplan)
# network:
#   ethernets:
#     eth0:
#       mtu: 9000

Debugging MTU Issues¶

Test with ping¶

# Test specific packet size (1472 + 28 bytes IP/ICMP header = 1500)
ping -M do -s 1472 10.0.0.1          # Linux (-M do = don't fragment)
ping -D -s 1472 10.0.0.1             # macOS (-D = don't fragment)

# Binary search for path MTU
ping -M do -s 1400 10.0.0.1          # works? try higher
ping -M do -s 1450 10.0.0.1          # works? try higher
ping -M do -s 1472 10.0.0.1          # fails? MTU is between 1450-1472

Check interface MTU¶

ip link show eth0 | grep mtu
cat /sys/class/net/eth0/mtu

tracepath (discovers path MTU)¶

tracepath 10.0.0.1
# Shows MTU at each hop and the overall path MTU

tcpdump for fragmentation¶

# Look for fragmented packets
tcpdump -i eth0 'ip[6:2] & 0x3fff != 0'

# Look for ICMP "need to frag" messages
tcpdump -i eth0 'icmp[0] == 3 and icmp[1] == 4'

MTU in Overlay Networks¶

Overlay networks (VXLAN, GRE, IPsec) add headers, reducing effective MTU:

Overlay	Header Overhead	Effective MTU (on 1500 underlay)
VXLAN	50 bytes	1450
GRE	24 bytes	1476
IPsec (tunnel)	52-73 bytes	1427-1448
WireGuard	60 bytes	1440

Kubernetes/Container MTU¶

CNI plugins must set pod MTU correctly. If the underlay is 1500 and the overlay adds 50 bytes, pod MTU should be 1450. Misconfigured MTU causes:

War story: A common Kubernetes outage pattern: new cluster, Calico CNI with VXLAN, pods on the same node communicate fine but cross-node large HTTP responses hang. Cause: Calico defaults to MTU 1440 but the admin manually set pod MTU to 1500, not accounting for the 50-byte VXLAN overhead. Small DNS and health check packets work fine. Large API responses silently drop. Fix: set the CNI MTU to underlay_MTU - overlay_overhead. - Pod-to-pod large transfers fail - DNS works but HTTP hangs - Intermittent timeouts on services

# Check pod interface MTU
kubectl exec -it <pod> -- ip link show eth0

TCP MSS Clamping¶

Analogy: MSS clamping is like putting a "max height" sign on a tunnel entrance. Instead of letting trucks (large packets) drive in and get stuck, you tell them their maximum size during the TCP handshake (SYN packet). The sender then voluntarily keeps all its segments under that limit. This works even when PMTUD is broken because it operates at connection setup time, not during data transfer.

When you cannot fix MTU everywhere, clamp the TCP MSS to force smaller segments:

# Clamp MSS to match a 1400 MTU
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
  -j TCPMSS --set-mss 1360

# Or auto-clamp to PMTU
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
  -j TCPMSS --clamp-mss-to-pmtu

Quick Reference¶

Task	Command
Show interface MTU	`ip link show eth0`
Set MTU	`ip link set dev eth0 mtu 9000`
Test path MTU	`ping -M do -s 1472 <host>`
Discover path MTU	`tracepath <host>`
Find fragmented packets	`tcpdump -i eth0 'ip[6:2] & 0x3fff != 0'`
Clamp TCP MSS	`iptables -t mangle ... -j TCPMSS --clamp-mss-to-pmtu`
Check frag stats	`netstat -s \\| grep -i frag`

Case Study: Jumbo Frames Partial (Case Study, L2) — MTU
Case Study: MTU Blackhole TLS Stalls (Case Study, L2) — MTU
Case Study: SSH Timeout — MTU Mismatch, Fix Is Terraform Variable (Case Study, L2) — MTU
MTU Flashcards (CLI) (flashcard_deck, L1) — MTU
Networking Deep Dive (Topic Pack, L1) — MTU
Runbook: MTU Mismatch (Runbook, L2) — MTU
Scenario: MTU Blackhole (Scenario, L2) — MTU

MTU (Maximum Transmission Unit) - Primer¶

Why This Matters¶

Fundamentals¶

Path MTU Discovery (PMTUD)¶

PMTUD Blackholes¶

Fragmentation¶

Jumbo Frames¶

Debugging MTU Issues¶

Test with ping¶

Check interface MTU¶

tracepath (discovers path MTU)¶

tcpdump for fragmentation¶

MTU in Overlay Networks¶

Kubernetes/Container MTU¶

TCP MSS Clamping¶

Quick Reference¶

Wiki Navigation¶

Pages that link here¶

MTU (Maximum Transmission Unit) - Primer¶

Why This Matters¶

Fundamentals¶

Path MTU Discovery (PMTUD)¶

PMTUD Blackholes¶

Fragmentation¶

Jumbo Frames¶

Debugging MTU Issues¶

Test with ping¶

Check interface MTU¶

tracepath (discovers path MTU)¶

tcpdump for fragmentation¶

MTU in Overlay Networks¶

Kubernetes/Container MTU¶

TCP MSS Clamping¶

Quick Reference¶

Wiki Navigation¶

Related Content¶

Pages that link here¶