Solution: LACP Mismatch / One Link Hot¶
Summary¶
Both LACP links are up and negotiated, but traffic is imbalanced because the switch is using src-mac hashing for its port-channel while the server uses layer3+4. Since the switch sees all traffic from one MAC (bond0's MAC), it hashes all return traffic to a single physical link, causing one link to saturate while the other sits idle.
Senior Workflow¶
Step 1: Verify bond status on the server¶
Check: both slaves show "MII Status: up", LACP partner info populated, and aggregator IDs match.
Step 2: Check per-link traffic stats¶
# Per-interface byte counters
cat /sys/class/net/eth2/statistics/tx_bytes
cat /sys/class/net/eth2/statistics/rx_bytes
cat /sys/class/net/eth3/statistics/tx_bytes
cat /sys/class/net/eth3/statistics/rx_bytes
# Or use ethtool -S
ethtool -S eth2 | grep -E 'rx_bytes|tx_bytes'
ethtool -S eth3 | grep -E 'rx_bytes|tx_bytes'
Expect: eth2 has orders of magnitude more traffic than eth3.
Step 3: Check server-side hashing¶
cat /sys/class/net/bond0/bonding/xmit_hash_policy
# Expected: layer3+4 (hashes on src/dst IP + src/dst port)
Step 4: Check switch-side hashing¶
# On the switch (Cisco example):
show etherchannel load-balance
# If it shows: src-mac
# That means ALL traffic from the server hashes to one link
# because the server presents one MAC address (bond0's MAC)
Step 5: Fix the switch hashing algorithm¶
# Cisco IOS:
port-channel load-balance src-dst-ip-port
# Cisco NX-OS:
port-channel load-balance ethernet source-dest-ip-port
# Arista:
ip load-sharing hash algorithm symmetric
Step 6: Verify after the fix¶
# Wait for traffic to stabilize, then recheck counters
watch -d 'cat /sys/class/net/eth2/statistics/rx_bytes; cat /sys/class/net/eth3/statistics/rx_bytes'
Traffic should now be distributed more evenly across both links.
Step 7: Monitor for microbursts¶
Drop counters should stop incrementing after the fix.
Common Pitfalls¶
- "Both links are up, so it must be balanced": LACP UP status only means the aggregation is negotiated. Distribution depends on the hashing algorithm.
- Expecting perfect 50/50 split: Hash-based distribution is statistical. With few flows, distribution can still be skewed. layer3+4 gives better distribution with many flows.
- Only fixing one side: Both the server and switch independently decide which link to use for egress traffic. Both hashing policies matter.
- src-mac hashing with bonds: When a server presents a single bonded MAC, src-mac hashing on the switch guarantees all return traffic hits one link.
- Ignoring the traffic pattern: If the workload is dominated by one large flow (e.g., replication from a single source), no hashing policy can split that single flow across links.
- Confusing LACP rate with load balance: LACP fast/slow rate affects failure detection speed, not traffic distribution.