Routing - Street-Level Ops¶
Real-world routing diagnosis and management workflows for production Linux systems.
Task: Find Why Traffic Takes an Unexpected Path¶
# Application connects to 10.100.5.3 but hits the wrong backend
$ ip route get 10.100.5.3
10.100.5.3 via 10.0.1.1 dev wg0 src 10.0.0.5 uid 0
# Traffic goes through wg0 (WireGuard VPN), not eth0
# A VPN route is more specific than the default route
# Check all routing tables for competing routes
$ ip route show table all | grep 10.100
10.100.0.0/16 via 10.0.1.1 dev wg0 table custom
10.100.0.0/8 via 10.0.0.1 dev eth0
# Policy-based routing is sending 10.100.x.x via VPN (more-specific route wins)
$ ip rule show
0: from all lookup local
100: from all lookup custom
32766: from all lookup main
32767: from all lookup default
Remember: Route selection mnemonic: L-M-D — Longest prefix match wins, then Metric (lower is preferred), then order of Declaration. A
/24always beats a/16regardless of metric. This is why VPN split-tunnel routes often override your default route.
Task: Add a Static Route for an Internal Network¶
# New subnet 10.200.0.0/16 reachable through gateway 10.0.0.254
$ ip route add 10.200.0.0/16 via 10.0.0.254 dev eth0
# Verify
$ ip route get 10.200.5.1
10.200.5.1 via 10.0.0.254 dev eth0 src 10.0.0.5
# Make persistent — depends on your network manager
# For nmcli:
$ nmcli con mod eth0 +ipv4.routes "10.200.0.0/16 10.0.0.254"
$ nmcli con up eth0
> **Default trap:** `nmcli con up eth0` reapplies the entire connection profile, which can briefly drop traffic. On a production server with active connections, use `nmcli con mod eth0 +ipv4.routes "..." && ip route add ...` to apply the route immediately via `ip route` and persist via `nmcli` — without bouncing the interface.
# For /etc/network/interfaces (Debian):
# up ip route add 10.200.0.0/16 via 10.0.0.254
Task: Diagnose "No Route to Host"¶
# Error: connect: No route to host
$ ip route get 192.168.50.1
RTNETLINK answers: Network is unreachable
# No route matches. Check the routing table:
$ ip route show
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.5
# Default route exists — but is the gateway reachable?
$ ping -c 1 10.0.0.1
From 10.0.0.5 icmp_seq=1 Destination Host Unreachable
# Gateway is down. Check ARP:
$ ip neigh show 10.0.0.1
10.0.0.1 dev eth0 FAILED
# Gateway is not responding to ARP. Physical issue or gateway is down.
Debug clue:
ip neigh showstates:REACHABLE= recently confirmed,STALE= cached but unverified,FAILED= ARP request got no response,INCOMPLETE= ARP in progress. AFAILEDgateway means either the gateway is down, or there is a Layer 2 issue (wrong VLAN, cable unplugged, switch port error-disabled).
Task: Watch for Route Changes in Real-Time¶
# Something is adding/removing routes — watch it happen
$ ip monitor route
10.244.1.0/24 via 10.0.0.11 dev eth0 proto bird
Deleted: 10.244.1.0/24 via 10.0.0.11 dev eth0 proto bird
10.244.1.0/24 via 10.0.0.12 dev eth0 proto bird
# A routing daemon (bird/calico) is moving routes between nodes
# This can indicate a node failure or pod migration
One-liner:
ip monitor routeis to routing whattail -fis to logs — it shows changes as they happen. Pair it withip monitor neighto watch ARP table changes simultaneously.
Task: Set Up Policy-Based Routing for Dual ISPs¶
# Two ISPs: eth0 (ISP1, 203.0.113.1) and eth1 (ISP2, 198.51.100.1)
# Traffic from 10.0.1.0/24 should use ISP2
# Create custom routing table
$ echo "100 isp2" >> /etc/iproute2/rt_tables
# Add default route in the custom table
$ ip route add default via 198.51.100.1 table isp2
# Rule: traffic from the subnet uses ISP2
$ ip rule add from 10.0.1.0/24 table isp2 priority 100
# Verify
$ ip route get 8.8.8.8 from 10.0.1.5
8.8.8.8 via 198.51.100.1 dev eth1 table isp2 src 10.0.1.5
$ ip route get 8.8.8.8 from 10.0.0.5
8.8.8.8 via 203.0.113.1 dev eth0 table main src 10.0.0.5
Task: Create a Blackhole Route for DDoS Mitigation¶
# Block all traffic to/from a known attack source
$ ip route add blackhole 198.51.100.0/24
# Verify
$ ip route get 198.51.100.5
RTNETLINK answers: No route to host # Silently dropped
# List all blackhole routes
$ ip route show type blackhole
blackhole 198.51.100.0/24
# Remove when attack is mitigated
$ ip route del blackhole 198.51.100.0/24
Under the hood: A blackhole route silently discards packets in the kernel's routing layer — no ICMP response, no CPU spent on firewall processing. It is faster than an iptables DROP rule because the packet never enters netfilter. For DDoS, blackhole routes are preferred because they consume almost zero resources per packet.
Task: Debug Asymmetric Routing¶
# TCP connections reset intermittently
# Forward path: client -> firewall -> server
# Return path: server -> different router -> client (bypasses firewall)
# Check with tcpdump — see SYN but no SYN-ACK on the firewall
$ tcpdump -ni eth0 host 10.0.0.20 and port 80
10.0.0.50.43210 > 10.0.0.20.80: Flags [S] # SYN arrives
# No SYN-ACK — it is leaving via a different path
# On the server, check the return route
$ ip route get 10.0.0.50
10.0.0.50 via 10.0.1.1 dev eth1 # Different gateway!
# Fix: add a route so return traffic uses the same path
$ ip route add 10.0.0.0/24 via 10.0.0.1 dev eth0
Under the hood: Stateful firewalls (iptables with conntrack, cloud security groups) drop return packets that arrive on a different interface than the original SYN. The firewall sees the SYN-ACK but has no matching conntrack entry for that direction — so it drops it. Asymmetric routing through a stateful firewall is always broken. Either fix the routing or disable connection tracking on the affected path.
Task: Troubleshoot Missing Default Route After DHCP¶
# Host lost external connectivity. Local traffic works.
$ ip route show
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.5
# No default route!
# DHCP lease may have expired
$ journalctl -u dhclient -n 20
# or
$ journalctl -u NetworkManager -n 20 | grep -i dhcp
# Force DHCP renewal
$ dhclient -r eth0 && dhclient eth0
# Verify default route returns
$ ip route show default
default via 10.0.0.1 dev eth0 proto dhcp metric 100
Debug clue: The
protofield in route output tells you who installed the route:kernel= directly connected,dhcp= DHCP client,bird/bgp= routing daemon,static= manually added. When a route mysteriously appears or disappears,protois the first clue to which process is responsible.
Task: Inspect Kubernetes CNI Routes¶
# Pod connectivity issues — check node routing table
$ ip route show | grep -E "10.244|10.96"
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 10.0.0.11 dev eth0 proto bird
10.244.2.0/24 via 10.0.0.12 dev eth0 proto bird
# Each node's pod CIDR has a route. Missing route = CNI issue
# If 10.244.3.0/24 is missing, node13 pods are unreachable
# Check the CNI agent
$ kubectl -n kube-system get pods -l app=calico-node
$ kubectl -n kube-system logs calico-node-xxxxx | tail -20
Gotcha:
ip route addchanges are volatile — they vanish on reboot. If you add a static route to fix a production issue at 3 AM, also persist it withnmclior in/etc/network/interfaces. Otherwise the next reboot brings the outage back and nobody remembers the fix.
Emergency: Delete a Bad Route Causing Outage¶
# Wrong route was added — production traffic going to wrong gateway
$ ip route del 10.100.0.0/16 via 10.0.99.1
$ ip route add 10.100.0.0/16 via 10.0.0.1
# Verify immediately
$ ip route get 10.100.5.1
10.100.5.1 via 10.0.0.1 dev eth0 src 10.0.0.5