Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy¶

Domains: observability | kubernetes_ops | networking Level: L2 Estimated time: 30-45 min

Initial Alert¶

Oncall Slack message at 08:15 UTC from the platform team:

All Grafana dashboards for the `payments` namespace are showing "No data"
since approximately 07:30 UTC. Other namespaces look fine.
Prometheus is up and running.

Observable Symptoms¶

Grafana dashboards for payments namespace services show "No data" for all panels since 07:30 UTC.
Grafana dashboards for orders, users, and monitoring namespaces are normal.
Prometheus UI (/targets) shows all payments namespace scrape targets as DOWN with error: context deadline exceeded.
Prometheus itself is healthy — /metrics and /api/v1/query respond normally.
The payments namespace services are running and healthy — they respond to HTTP requests on their application ports.
curl http://payment-service.payments.svc.cluster.local:9090/metrics from a debug pod in the payments namespace succeeds.

The Misleading Signal¶

Empty Grafana dashboards point to an observability problem — maybe a Grafana data source misconfiguration, a Prometheus query issue, or a PromQL syntax error. The fact that Prometheus shows targets as DOWN shifts suspicion to the target services — maybe they stopped exporting metrics or the metrics endpoint changed. The services being "healthy" on their application port but DOWN on the metrics port makes engineers suspect a recent deployment changed the metrics configuration.