Diagnostic Questions¶
Before revealing the investigation path:¶
-
The checkout-api p99 latency spiked but only for the
/v1/checkoutendpoint. All other endpoints are fine. What does this tell you about where the problem likely is — the application, the infrastructure, or a downstream dependency? -
curlfrom inside the cluster takes 3-5 seconds to reach the payment gateway, butcurlfrom a home network takes 90ms. What does this asymmetry suggest about the nature of the problem? -
A traceroute from the cluster shows traffic going through an unexpected transit AS with high latency. What would you check on the edge router to understand why the preferred path is not being used?
-
The BGP prefix-list was added in response to a security ticket. The security intent was to block inbound scanning, but the implementation blocked outbound route selection. Why is the correct fix a security ACL rather than simply removing the prefix-list?
-
How would you prevent a security change request from accidentally breaking network routing in the future? What review process or technical controls would help?