Incident Replay: Cable Plugged Into Wrong Port¶
Setup¶
- System context: New server rack-and-stack in a production datacenter. 8 servers being cabled to ToR switches. Patch panel labels were printed but not yet verified.
- Time: Thursday 14:30 UTC
- Your role: Datacenter technician
Round 1: Alert Fires¶
[Pressure cue: "Network team reports server db-prod-07 is on the wrong VLAN. Application team sees cross-talk from a staging environment. You have 15 minutes before they escalate."]
What you see: Server db-prod-07 can ping staging hosts it should not reach. The VLAN assignment on the switch port is correct for port Gi1/0/23, but the server is not plugged into Gi1/0/23 — it is in Gi1/0/31 (a staging VLAN port).
Choose your action: - A) Change the VLAN on Gi1/0/31 to match production - B) Physically trace the cable from db-prod-07 to verify which switch port it lands on - C) Shut down the server's network interface to stop the cross-talk immediately - D) Ask the network team to MAC-address-search to find the actual port
If you chose A:¶
[Result: You move the VLAN but now whatever was supposed to be on Gi1/0/31 is broken. You have shifted the problem to another server.]
If you chose B (recommended):¶
[Result: Cable trace confirms db-prod-07's eth0 is patched to panel port 31, which maps to switch Gi1/0/31 (staging VLAN). The patch panel label says "23" but the cable runs to position 31. Mislabel. Proceed to Round 2.]
If you chose C:¶
[Result: Cross-talk stops but the production database is now offline. You have traded a security issue for an availability issue.]
If you chose D:¶
[Result:
show mac address-table | include <mac>shows the MAC on Gi1/0/31. Confirms the wrong port but you still need to trace the physical cable. Adds 5 minutes.]
Round 2: First Triage Data¶
[Pressure cue: "Security team is now involved — production data may have been exposed to staging. They want the physical fix confirmed."]
What you see: Patch panel label at position 23 is wrong — the actual cable run goes to position 31. Two cables were swapped during the rack-and-stack. Server db-prod-08 (which should be on Gi1/0/31 staging) is also on the wrong port.
Choose your action: - A) Swap the two cables at the patch panel to correct both servers - B) Swap the cables at the switch side instead - C) Re-label the patch panel to match current cabling and update the switch VLANs - D) Pull both cables and re-run them correctly
If you chose A (recommended):¶
[Result: Cables swapped at patch panel. db-prod-07 now on correct switch port Gi1/0/23 (production VLAN). db-prod-08 on Gi1/0/31 (staging). Both servers verify correct connectivity. Proceed to Round 3.]
If you chose B:¶
[Result: Also works but switch-side cable management is often tighter and harder to reroute. Takes longer and risks disturbing adjacent connections.]
If you chose C:¶
[Result: This "fixes" things logically but the physical documentation is now permanently wrong. The next technician will be confused. Creates future incidents.]
If you chose D:¶
[Result: Overkill — a simple swap takes 30 seconds. Re-running cables takes 20+ minutes and requires new cable management ties.]
Round 3: Root Cause Identification¶
[Pressure cue: "Incident commander asks for root cause to put in the post-mortem."]
What you see: Root cause: During rack-and-stack, two patch panel cables were swapped. The labeling was done before cable verification. No post-cabling verification step existed in the SOP.
Choose your action: - A) Add a cable verification step to the rack-and-stack SOP - B) Implement automated MAC-to-port verification in provisioning - C) Add LLDP/CDP verification as part of server commissioning - D) All of the above
If you chose D (recommended):¶
[Result: All three controls address different layers — physical verification, automated detection, and protocol-based validation. Defense in depth. Proceed to Round 4.]
If you chose A:¶
[Result: Manual verification helps but humans still make mistakes. Partial fix.]
If you chose B:¶
[Result: Automated checks catch issues faster but require tooling investment. Good but incomplete alone.]
If you chose C:¶
[Result: LLDP verification is excellent for detecting mispatched cables. But only works if LLDP is enabled on the switches.]
Round 4: Remediation¶
[Pressure cue: "Verify and close the incident."]
Actions:
1. Verify db-prod-07 is on the correct VLAN: ip addr show eth0 and test connectivity
2. Verify db-prod-08 is on the correct VLAN similarly
3. Update patch panel labels to match verified cabling
4. Run lldpctl on both servers to confirm switch port identity
5. Audit all 8 servers in the rack for correct port assignments
Damage Report¶
- Total downtime: 0 (services were running, but on wrong network segment)
- Blast radius: Potential data exposure between production and staging for ~2 hours
- Optimal resolution time: 10 minutes (trace cable -> swap at patch panel -> verify)
- If every wrong choice was made: 45+ minutes plus security incident from prolonged cross-VLAN exposure
Cross-References¶
- Primer: Datacenter & Server Hardware
- Primer: VLANs
- Footguns: Networking