Skip to content

LACP - Street-Level Ops

Real-world bond/LACP diagnosis and management workflows for production Linux servers.

Task: Verify Bond Status and Health

# Full bond status — the single most important diagnostic
$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v5.15.0

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
LACP Rate: fast
Aggregator ID: 1

Slave Interface: eth0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Partner MAC: 00:1a:2b:3c:4d:01

Slave Interface: eth1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Partner MAC: 00:1a:2b:3c:4d:02

# Quick check — is bond up with both members?
$ ip -br link show bond0
bond0  UP  aa:bb:cc:dd:ee:00 <BROADCAST,MULTICAST,MASTER,UP>

$ ip -br link show eth0
eth0   UP  aa:bb:cc:dd:ee:00 <BROADCAST,MULTICAST,SLAVE,UP>

Remember: Bond mode mnemonic: modes 0-6 — the two you need to know are 1 (active-backup, no switch config needed) and 4 (802.3ad/LACP, requires switch config). Mode 0 (round-robin) causes out-of-order packets; mode 6 (balance-alb) is smart but fragile. In production, stick to 1 or 4.

Task: Create LACP Bond from Scratch

# Load bonding module
$ modprobe bonding

# Create bond with LACP (mode 4)
$ ip link add bond0 type bond mode 802.3ad
$ ip link set bond0 type bond lacp_rate fast
$ ip link set bond0 type bond xmit_hash_policy layer3+4
$ ip link set bond0 type bond miimon 100

# Add member interfaces (must be down first)
$ ip link set eth0 down
$ ip link set eth1 down
$ ip link set eth0 master bond0
$ ip link set eth1 master bond0

# Bring everything up
$ ip link set bond0 up
$ ip addr add 10.0.0.5/24 dev bond0
$ ip route add default via 10.0.0.1

# Verify LACP negotiation
$ cat /proc/net/bonding/bond0 | grep -A2 "Partner"

Default trap: miimon defaults to 0 (disabled), meaning the bond will not detect link failures. Always set miimon 100 (check every 100ms) when creating a bond. Without it, a failed link stays in the bond indefinitely, silently black-holing traffic sent to it.

# Bond shows up but only one member is aggregated
$ cat /proc/net/bonding/bond0 | grep -B1 "Aggregator ID"
Slave Interface: eth0
Aggregator ID: 1

Slave Interface: eth1
Aggregator ID: 2    # Different aggregator = NOT bundled

# Causes: switch LACP not configured on eth1's port,
# or speed/duplex mismatch

# Check physical link
$ ethtool eth1 | grep -E "Speed|Duplex|Link"
Speed: 1000Mb/s
Duplex: Full
Link detected: yes

# Compare with eth0
$ ethtool eth0 | grep -E "Speed|Duplex|Link"
Speed: 10000Mb/s
Duplex: Full
Link detected: yes

# Speed mismatch — eth1 negotiated 1G instead of 10G
# Fix on switch side or check cabling

Debug clue: Different Aggregator IDs on bond members means LACP did not bundle them together. The three most common causes: (1) switch port-channel not configured on one port, (2) speed/duplex mismatch between members, (3) LACP rate mismatch (host says fast, switch says slow). Check the switch side first — that is where the problem usually is.

Task: Test Failover by Downing a Member

# Simulate link failure
$ ip link set eth0 down

# Check bond reacts
$ cat /proc/net/bonding/bond0 | grep -A1 "Slave Interface"
Slave Interface: eth0
MII Status: down

Slave Interface: eth1
MII Status: up

# Verify traffic still flows
$ ping -c 3 10.0.0.1
3 packets transmitted, 3 received, 0% packet loss

# Restore
$ ip link set eth0 up

# Watch bond re-aggregate (LACP rate fast = 1s PDUs, recovery ~3s)
$ watch -n1 'cat /proc/net/bonding/bond0 | grep -A3 "Slave Interface"'

Under the hood: LACP fast rate sends PDUs every 1 second (3-second timeout), slow rate sends every 30 seconds (90-second timeout). Use fast for server bonds where quick failover matters. Use slow for stable infrastructure links (switch-to-switch) to reduce control plane overhead. Mismatched rates between host and switch cause the slow side to declare the link dead after 90 seconds of silence.

Task: Set Up Active-Backup Bond (No Switch Config Needed)

# Mode 1 — simplest redundancy
$ ip link add bond0 type bond mode active-backup
$ ip link set bond0 type bond miimon 100
$ ip link set bond0 type bond primary eth0

$ ip link set eth0 down && ip link set eth0 master bond0
$ ip link set eth1 down && ip link set eth1 master bond0
$ ip link set bond0 up

# Check which interface is active
$ cat /proc/net/bonding/bond0 | grep "Currently Active"
Currently Active Slave: eth0

One-liner: Mode 1 (active-backup) is the "just works" bond — no switch configuration, no LACP negotiation, instant failover. Use it when you need redundancy but cannot coordinate with the network team, or when each NIC connects to a different switch (dual-homed server).

Task: Configure Bond with nmcli (Persistent)

# Create LACP bond
$ nmcli con add type bond con-name bond0 ifname bond0 \
    bond.options "mode=802.3ad,lacp_rate=fast,xmit_hash_policy=layer3+4,miimon=100"

# Add members
$ nmcli con add type ethernet con-name bond0-eth0 ifname eth0 master bond0
$ nmcli con add type ethernet con-name bond0-eth1 ifname eth1 master bond0

# Assign IP
$ nmcli con mod bond0 ipv4.addresses 10.0.0.5/24
$ nmcli con mod bond0 ipv4.gateway 10.0.0.1
$ nmcli con mod bond0 ipv4.method manual

# Activate
$ nmcli con up bond0
$ nmcli con up bond0-eth0
$ nmcli con up bond0-eth1

# Verify
$ nmcli con show bond0 | grep -i bond
# Monitor per-member traffic counters
$ ip -s link show eth0 | grep -A1 "RX\|TX"
    RX:  bytes  packets errors dropped
    1284923847  9482731      0       0
    TX:  bytes  packets errors dropped
    2847291034  8173920      0       0

$ ip -s link show eth1 | grep -A1 "RX\|TX"
    RX:  bytes  packets errors dropped
    1301847293  9518234      0       0
    TX:  bytes  packets errors dropped
    2891034728  8209134      0       0

# Roughly balanced — good. If one link carries 90%+ traffic,
# consider switching hash policy to layer3+4

Gotcha: xmit_hash_policy=layer2 (the default) hashes on MAC addresses. In many server environments, all traffic goes to one gateway MAC, so one bond member carries 100% of outbound traffic. Switch to layer3+4 (hashes on IP + port) for proper distribution across members. This requires the switch to also use a compatible hash algorithm.


Task: Monitor Bond for Flapping

# Watch for link state changes in real-time
$ journalctl -k -f | grep -i bond
bond0: link status up for interface eth0, enabling it in 800 ms
bond0: link status down for interface eth0, disabling it
bond0: link status up for interface eth0, enabling it in 800 ms

# Repeated up/down = flapping. Check:
# - Cable/SFP on eth0
# - Switch port errors: ethtool -S eth0 | grep error
# - LACP rate mismatch between host and switch

Scale note: At scale with hundreds of bonded servers, centralize bond health monitoring. Export /proc/net/bonding/bond0 via a Prometheus textfile collector or custom exporter. Alert on bond_slaves_active < bond_slaves_total — a single degraded bond in a fleet of 500 servers is easy to miss without automated monitoring.