- networking
- l1
- topic-pack
- mtu --- Portal | Level: L1: Foundations | Topics: MTU | Domain: Networking
MTU (Maximum Transmission Unit) - Primer¶
Why This Matters¶
MTU mismatches cause some of the most baffling network issues: small pings work but large transfers hang, SSH connects but SCP stalls, web pages partially load. These problems are invisible to basic connectivity tests and can persist for weeks before anyone connects the symptoms to an MTU issue.
Fundamentals¶
MTU is the largest packet size (in bytes) that a network interface will transmit without fragmentation.
Name origin: The 1500-byte Ethernet MTU dates to the original DIX Ethernet specification (1980). It was a compromise between efficiency (larger frames = less overhead) and the cost of buffer memory on early 1980s network interface cards. The 1500-byte standard has survived four decades despite dramatic drops in memory costs.
| Network Type | Typical MTU |
|---|---|
| Ethernet | 1500 bytes |
| Jumbo frames | 9000 bytes |
| PPPoE (DSL) | 1492 bytes |
| VPN tunnels (IPsec) | 1400-1436 bytes |
| VXLAN overlay | 1450 bytes |
| GRE tunnel | 1476 bytes |
The IP header is 20 bytes and TCP header is 20 bytes, so with 1500 MTU the maximum TCP payload (MSS) is 1460 bytes.
Remember: The "MTU math" shortcut: MSS = MTU - 40 (for IPv4 without options). For IPv6: MSS = MTU - 60 (IPv6 header is 40 bytes). For tunnels, subtract the overlay header first, then subtract 40. Example: VXLAN on 1500 underlay = 1450 MTU = 1410 MSS.
Under the hood: The full Ethernet frame is actually 1518 bytes: 14-byte Ethernet header (6B dst MAC + 6B src MAC + 2B EtherType) + 1500-byte payload + 4-byte FCS (Frame Check Sequence). With an 802.1Q VLAN tag, the frame grows to 1522 bytes. The "1500 MTU" refers only to the Layer 3 payload.
Path MTU Discovery (PMTUD)¶
PMTUD lets endpoints discover the smallest MTU along a path without manual configuration.
- Sender sends packets with the Don't Fragment (DF) bit set
- If a router cannot forward (packet > link MTU), it drops the packet and sends back an ICMP "Fragmentation Needed" message
- Sender reduces packet size and retries
PMTUD Blackholes¶
PMTUD breaks when ICMP is blocked by firewalls. The sender never gets the "too big" message and keeps retrying with oversized packets. This is an MTU blackhole.
Gotcha: Many firewall administrators block all ICMP "for security." This breaks PMTUD and creates MTU blackholes that are extremely hard to diagnose. The correct practice is to allow ICMP Type 3 (Destination Unreachable), especially Code 4 (Fragmentation Needed). Blocking this specific ICMP type causes real outages; allowing it is not a security risk.
Symptoms: - Small packets work (ping, DNS, SSH login) - Large transfers hang (SCP, HTTP downloads, database queries) - TCP connections established but data transfer stalls
Fragmentation¶
Debug clue: The classic MTU blackhole pattern:
ping -s 56 hostworks, butping -M do -s 1472 hostfails with "message too long" or simply times out. If the small ping works and the large one silently hangs (no ICMP error returned), a firewall between you and the target is dropping ICMP Type 3 Code 4 messages. Usetracepathto find the hop where the MTU drops.
When DF is not set, oversized packets get fragmented at routers. This is bad:
- Fragments must all arrive for reassembly — one lost fragment means retransmit everything
- Fragments increase CPU load on routers
- Stateful firewalls may not track fragments properly
- Fragment reassembly attacks are a DoS vector
Jumbo Frames¶
Jumbo frames use MTU 9000 (or up to 9216). Benefits: fewer packets, less CPU overhead, higher throughput for bulk transfers.
Fun fact: The term "jumbo frames" was never formally standardized by IEEE. The 9000-byte size became a de facto standard because early Alteon Networks switches supported it. Some vendors support up to 9216 bytes (9000 payload + Ethernet overhead). The name stuck because 9000 bytes felt "jumbo" compared to 1500.
Gotcha: AWS EC2 instances support jumbo frames (MTU 9001) within the same VPC, but traffic crossing a VPC peering connection, VPN, or internet gateway is clamped to 1500. Forgetting this causes silent packet drops on cross-VPC bulk transfers.
Requirements: - Every device in the path must support jumbo frames: NICs, switches, routers - One device at 1500 in a jumbo frame path silently drops oversized frames - Typically used only within datacenters on dedicated storage/cluster networks
# Set jumbo frames
ip link set dev eth0 mtu 9000
# Persistent (RHEL/CentOS)
# Add MTU=9000 to /etc/sysconfig/network-scripts/ifcfg-eth0
# Persistent (netplan)
# network:
# ethernets:
# eth0:
# mtu: 9000
Debugging MTU Issues¶
Test with ping¶
# Test specific packet size (1472 + 28 bytes IP/ICMP header = 1500)
ping -M do -s 1472 10.0.0.1 # Linux (-M do = don't fragment)
ping -D -s 1472 10.0.0.1 # macOS (-D = don't fragment)
# Binary search for path MTU
ping -M do -s 1400 10.0.0.1 # works? try higher
ping -M do -s 1450 10.0.0.1 # works? try higher
ping -M do -s 1472 10.0.0.1 # fails? MTU is between 1450-1472
Check interface MTU¶
tracepath (discovers path MTU)¶
tcpdump for fragmentation¶
# Look for fragmented packets
tcpdump -i eth0 'ip[6:2] & 0x3fff != 0'
# Look for ICMP "need to frag" messages
tcpdump -i eth0 'icmp[0] == 3 and icmp[1] == 4'
MTU in Overlay Networks¶
Overlay networks (VXLAN, GRE, IPsec) add headers, reducing effective MTU:
| Overlay | Header Overhead | Effective MTU (on 1500 underlay) |
|---|---|---|
| VXLAN | 50 bytes | 1450 |
| GRE | 24 bytes | 1476 |
| IPsec (tunnel) | 52-73 bytes | 1427-1448 |
| WireGuard | 60 bytes | 1440 |
Kubernetes/Container MTU¶
CNI plugins must set pod MTU correctly. If the underlay is 1500 and the overlay adds 50 bytes, pod MTU should be 1450. Misconfigured MTU causes:
War story: A common Kubernetes outage pattern: new cluster, Calico CNI with VXLAN, pods on the same node communicate fine but cross-node large HTTP responses hang. Cause: Calico defaults to MTU 1440 but the admin manually set pod MTU to 1500, not accounting for the 50-byte VXLAN overhead. Small DNS and health check packets work fine. Large API responses silently drop. Fix: set the CNI MTU to
underlay_MTU - overlay_overhead. - Pod-to-pod large transfers fail - DNS works but HTTP hangs - Intermittent timeouts on services
TCP MSS Clamping¶
Analogy: MSS clamping is like putting a "max height" sign on a tunnel entrance. Instead of letting trucks (large packets) drive in and get stuck, you tell them their maximum size during the TCP handshake (SYN packet). The sender then voluntarily keeps all its segments under that limit. This works even when PMTUD is broken because it operates at connection setup time, not during data transfer.
When you cannot fix MTU everywhere, clamp the TCP MSS to force smaller segments:
# Clamp MSS to match a 1400 MTU
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
-j TCPMSS --set-mss 1360
# Or auto-clamp to PMTU
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
-j TCPMSS --clamp-mss-to-pmtu
Quick Reference¶
| Task | Command |
|---|---|
| Show interface MTU | ip link show eth0 |
| Set MTU | ip link set dev eth0 mtu 9000 |
| Test path MTU | ping -M do -s 1472 <host> |
| Discover path MTU | tracepath <host> |
| Find fragmented packets | tcpdump -i eth0 'ip[6:2] & 0x3fff != 0' |
| Clamp TCP MSS | iptables -t mangle ... -j TCPMSS --clamp-mss-to-pmtu |
| Check frag stats | netstat -s \| grep -i frag |
Wiki Navigation¶
Related Content¶
- Case Study: Jumbo Frames Partial (Case Study, L2) — MTU
- Case Study: MTU Blackhole TLS Stalls (Case Study, L2) — MTU
- Case Study: SSH Timeout — MTU Mismatch, Fix Is Terraform Variable (Case Study, L2) — MTU
- MTU Flashcards (CLI) (flashcard_deck, L1) — MTU
- Networking Deep Dive (Topic Pack, L1) — MTU
- Runbook: MTU Mismatch (Runbook, L2) — MTU
- Scenario: MTU Blackhole (Scenario, L2) — MTU
Pages that link here¶
- Anti-Primer: MTU
- Jumbo Frames Enabled But Some Paths Failing
- MTU Black Hole / TLS Stalls
- Master Curriculum: 40 Weeks
- Mtu
- Production Readiness Review: Answer Key
- Production Readiness Review: Study Plans
- Runbook: MTU Mismatch
- Scenario: MTU Black Hole — Large Packets Silently Dropped
- Symptoms: SSH Timeout, MTU Mismatch, Fix Is Terraform Variable