Skip to content

Diagnostic Questions

Before revealing the investigation path:

  1. Grafana shows "No data" for one namespace but others are fine. Prometheus shows those targets as DOWN. Is this more likely a Grafana configuration issue, a Prometheus issue, or a target-side issue? How do you narrow it down?

  2. Metrics work from within the payments namespace but time out from the monitoring namespace. The application port (8080) works cross-namespace but the metrics port (9090) does not. What Kubernetes resource could cause port-specific cross-namespace blocking?

  3. A default-deny NetworkPolicy was applied 45 minutes ago, matching the start of the outage. The allow rule only includes port 8080. Why was port 9090 missed, and whose responsibility is it to ensure observability ports are included?

  4. The fix is a NetworkPolicy change (networking domain) rather than a Prometheus configuration change (observability) or a pod annotation change (kubernetes). Why is the network layer the correct place to fix this?

  5. How would you design a namespace hardening process that does not break observability? What standard template or automated check would catch this before it impacts dashboards?