Grading Checklist¶
A good response must include:
- Identifies the root cause: the aggressive hold timer (15s) combined with intermittent packet loss on a degraded link causes keepalive misses that trigger session drops
- Explains how BGP hold timers work: if no keepalive or update is received within the hold time, the session is declared down
- Correlates the link errors (CRC, input errors from the media converter) with keepalive packet loss
- Notes that 3 consecutive missed keepalives (at 5s intervals) within the 15s hold timer is very easy to trigger with even modest packet loss
- Proposes the two-part fix: (1) fix the physical layer issue (media converter) and (2) increase hold timer to a reasonable value
- Recommends standard hold timer values (90s keepalive 30s, or 60s/20s) unless BFD is available for fast failover
- Suggests using BFD for sub-second failure detection instead of aggressive BGP timers
- Recommends investigating the media converter (replace it, check SFP, check cable)
- Mentions route dampening as a mitigation for flapping routes affecting downstream routers
- Shows how to read the BGP event log to confirm hold timer expiry as the cause
- Warns that aggressive timers without a clean link cause more outages than they prevent