Symptoms: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy¶
Domains: observability | kubernetes_ops | networking Level: L2 Estimated time: 30-45 min
Initial Alert¶
Oncall Slack message at 08:15 UTC from the platform team:
All Grafana dashboards for the `payments` namespace are showing "No data"
since approximately 07:30 UTC. Other namespaces look fine.
Prometheus is up and running.
Observable Symptoms¶
- Grafana dashboards for
paymentsnamespace services show "No data" for all panels since 07:30 UTC. - Grafana dashboards for
orders,users, andmonitoringnamespaces are normal. - Prometheus UI (
/targets) shows allpaymentsnamespace scrape targets asDOWNwith error:context deadline exceeded. - Prometheus itself is healthy —
/metricsand/api/v1/queryrespond normally. - The
paymentsnamespace services are running and healthy — they respond to HTTP requests on their application ports. curl http://payment-service.payments.svc.cluster.local:9090/metricsfrom a debug pod in thepaymentsnamespace succeeds.
The Misleading Signal¶
Empty Grafana dashboards point to an observability problem — maybe a Grafana data source misconfiguration, a Prometheus query issue, or a PromQL syntax error. The fact that Prometheus shows targets as DOWN shifts suspicion to the target services — maybe they stopped exporting metrics or the metrics endpoint changed. The services being "healthy" on their application port but DOWN on the metrics port makes engineers suspect a recent deployment changed the metrics configuration.