Solution: Network Experiencing Broadcast Storm and High CPU on Switches¶
Triage¶
-
Check switch CPU (if management is still responsive):
Look for interrupt-driven CPU (packet processing), not process CPU. -
Check for MAC flapping:
Identify which MAC addresses are flapping and between which ports. -
Check STP status:
All ports should be in designated/root/blocking states. If STP is disabled or all ports are forwarding, that confirms the loop. -
Identify high-utilization ports:
Look for ports with abnormally high broadcast/multicast input rates. -
Check for the loop source:
A MAC appearing on multiple ports simultaneously indicates the loop path.
Root Cause¶
A physical loop was created in the network -- in this case, someone connected a patch cable between two access ports on the same switch (or between two switches) creating a redundant Layer 2 path.
Spanning Tree Protocol should have detected the loop and placed one port in blocking state. However, STP was disabled on the affected VLANs:
This was done previously (possibly to "fix" a convergence delay complaint) without understanding the consequences. Without STP, there is no loop detection, and broadcast frames circulate endlessly, consuming all bandwidth and switch CPU.The MAC address table fills up and thrashes as the same source MAC is learned on multiple ports in rapid succession, further destabilizing forwarding.
Fix¶
Immediate -- Break the Loop: 1. Identify the loop ports from MAC flapping logs:
Example:%SW_MATM-4-MACFLAP_NOTIF: Mac 00:50:56:ab:11:22 flap between Gi0/15 and Gi0/22
- Shut down one of the offending ports: CPU should drop immediately.
Re-enable STP:
Prevent Future Loops:
configure terminal
interface range GigabitEthernet0/1-24
spanning-tree portfast
spanning-tree bpduguard enable
storm-control broadcast level 10
storm-control action shutdown
end
write memory
- Bring the shut port back up (after STP is enabled): STP will now correctly block one of the redundant paths.
Rollback / Safety¶
- Shutting down a port is the fastest way to break a loop in an emergency.
- Re-enabling STP is non-disruptive if done after the loop is broken.
- BPDU Guard will err-disable ports that receive BPDUs -- ensure it is only on access ports, not trunk/uplink ports.
- Storm control shutdown can cause port outages if thresholds are too aggressive; start with 10% and tune.
- After recovery, physically trace and remove the offending cable.
Common Traps¶
- Trying to troubleshoot via SSH when the management plane is saturated -- may need console access to the switch.
- Disabling STP to "fix" convergence delays without understanding it creates loop vulnerability.
- Enabling portfast on uplink/trunk ports -- portfast should only be on edge/access ports.
- Not enabling BPDU Guard alongside portfast; portfast alone does not prevent loops from rogue switches.
- Assuming the loop is always a physical cable -- it can also be caused by VM bridge misconfigurations, wireless bridges, or transparent firewalls.
- Forgetting to check ALL VLANs -- STP operates per-VLAN (PVST+), and a loop may exist only in specific VLANs.
- Not implementing storm control as a safety net even when STP is properly configured -- defense in depth matters.