Skip to content

Incident Replay: Multicast Not Crossing Router

Setup

  • System context: Video streaming application using multicast (239.1.1.1) for internal broadcasts. Clients on the same VLAN as the source receive the stream. Clients on other VLANs get nothing.
  • Time: Thursday 13:00 UTC
  • Your role: Network engineer

Round 1: Alert Fires

[Pressure cue: "Conference room displays on VLAN 300 cannot receive the multicast video stream. Displays on VLAN 100 (same as source) work fine."]

What you see: Multicast traffic is L2 only — it is not being routed between VLANs. The router has no multicast routing (PIM) configured. IGMP snooping is enabled on switches but the router is not an IGMP querier for VLAN 300.

Choose your action: - A) Apply a quick workaround to restore service - B) Investigate the root cause systematically - C) Escalate to the vendor or upstream provider - D) Check if a recent change caused the issue

If you chose A:

[Result: Workaround provides temporary relief but masks the underlying issue. You will need to circle back.]

[Result: Systematic investigation reveals the root cause. Multicast routing was never configured because the original deployment had all clients on one VLAN. When VLANs were added, nobody enabled PIM/IGMP on the router interfaces. Proceed to Round 2.]

If you chose C:

[Result: Vendor/upstream confirms the issue is on your side. Time wasted on external coordination.]

If you chose D:

[Result: Change log review helps narrow the timeline but does not directly identify the technical cause. Partial progress.]

Round 2: First Triage Data

[Pressure cue: "Root cause identified. Apply the fix."]

What you see: Multicast routing was never configured because the original deployment had all clients on one VLAN. When VLANs were added, nobody enabled PIM/IGMP on the router interfaces.

Choose your action: - A) Apply the targeted fix - B) Apply the fix and verify with testing - C) Apply a broader fix that addresses the class of problem - D) Document and schedule the fix for the next maintenance window

[Result: Enable PIM sparse-mode on the inter-VLAN router interfaces. Configure IGMP on each VLAN SVI. Verify with show ip mroute that multicast traffic flows between VLANs. Service restored and verified. Proceed to Round 3.]

If you chose A:

[Result: Fix applied but not verified. May not be complete.]

If you chose C:

[Result: Broader fix is correct long-term but takes longer to implement during an incident.]

If you chose D:

[Result: Delaying the fix extends the outage or degradation. Apply now if possible.]

Round 3: Root Cause Identification

[Pressure cue: "Service restored. Document and prevent recurrence."]

What you see: Root cause confirmed: Multicast routing was never configured because the original deployment had all clients on one VLAN. When VLANs were added, nobody enabled PIM/IGMP on the router interfaces.

Choose your action: - A) Document the fix in the runbook - B) Add monitoring to detect this condition - C) Add the fix to automation/configuration management - D) All of the above

[Result: Documentation, monitoring, and automation all updated. Defense in depth prevents recurrence. Proceed to Round 4.]

If you chose A:

[Result: Documentation helps but relies on humans remembering to check it.]

If you chose B:

[Result: Monitoring detects faster but does not prevent.]

If you chose C:

[Result: Automation prevents recurrence but needs monitoring for edge cases.]

Round 4: Remediation

[Pressure cue: "Verify everything and close the incident."]

Actions: 1. Verify service is functioning correctly end-to-end 2. Verify monitoring detects the condition 3. Update runbooks and configuration management 4. Schedule post-mortem review 5. Check for similar issues across the infrastructure

Damage Report

  • Total downtime: Varies based on path chosen
  • Blast radius: Affected services and dependent systems
  • Optimal resolution time: 15 minutes
  • If every wrong choice was made: 90 minutes + additional damage

Cross-References