Investigation: Grafana Dashboard Empty, Prometheus Scrape Blocked by NetworkPolicy¶

Phase 1: Observability Investigation (Dead End)¶

Check Prometheus targets:

$ curl -s http://prometheus.monitoring.svc:9090/api/v1/targets | \
    jq '.data.activeTargets[] | select(.labels.namespace=="payments") | {endpoint: .scrapeUrl, health: .health, lastError: .lastError}' | head -20
{
  "endpoint": "http://10.244.3.42:9090/metrics",
  "health": "down",
  "lastError": "Get \"http://10.244.3.42:9090/metrics\": context deadline exceeded"
}
{
  "endpoint": "http://10.244.3.43:9090/metrics",
  "health": "down",
  "lastError": "Get \"http://10.244.3.43:9090/metrics\": context deadline exceeded"
}

All targets in the payments namespace are timing out. Check if the metrics endpoint is actually exposed:

# From a debug pod in the payments namespace
$ kubectl run debug --rm -it --image=curlimages/curl -n payments -- \
    curl -s http://10.244.3.42:9090/metrics | head -5
# TYPE payment_requests_total counter
payment_requests_total{method="POST",status="200"} 48291
payment_requests_total{method="POST",status="400"} 127

The metrics endpoint works from within the payments namespace. Try from the monitoring namespace (where Prometheus runs):

$ kubectl run debug --rm -it --image=curlimages/curl -n monitoring -- \
    curl -s --connect-timeout 5 http://10.244.3.42:9090/metrics
curl: (28) Connection timed out after 5001 milliseconds

Connection timeout from monitoring namespace to payments namespace on port 9090. But application port works:

$ kubectl run debug --rm -it --image=curlimages/curl -n monitoring -- \
    curl -s --connect-timeout 5 http://payment-service.payments.svc.cluster.local:8080/health
{"status": "healthy"}

Port 8080 (application) works cross-namespace. Port 9090 (metrics) does not. This is not a Prometheus or Grafana issue — it is a network-level block.

The Pivot¶

Check NetworkPolicies in the payments namespace:

$ kubectl get networkpolicy -n payments
NAME                    POD-SELECTOR       AGE
payments-default-deny   <none>             45m
payments-allow-ingress  app=payment-svc    45m

These NetworkPolicies are 45 minutes old — created at approximately 07:30 UTC, exactly when the dashboards went blank.

Phase 2: Kubernetes Investigation (Root Cause)¶

Inspect the NetworkPolicies:

$ kubectl describe networkpolicy payments-default-deny -n payments
Name:         payments-default-deny
Namespace:    payments
Spec:
  PodSelector:     <none> (applies to all pods)
  Allowing ingress traffic:
    <none> (traffic not allowed)
  Allowing egress traffic:
    <none> (traffic not allowed)
  Policy Types: Ingress, Egress

A default-deny NetworkPolicy was applied to all pods in the payments namespace. Now check the allow rule:

$ kubectl describe networkpolicy payments-allow-ingress -n payments
Name:         payments-allow-ingress
Namespace:    payments
Spec:
  PodSelector:     app=payment-svc
  Allowing ingress traffic:
    To Port: 8080/TCP
    From:
      NamespaceSelector: {}  (all namespaces)
  Policy Types: Ingress

The allow rule permits ingress on port 8080 from all namespaces. But there is no rule allowing ingress on port 9090 (the metrics port). Prometheus scrapes on port 9090 are blocked by the default-deny policy.

# Check who created these policies
$ kubectl get networkpolicy payments-default-deny -n payments \
    -o jsonpath='{.metadata.annotations}'
{"kubectl.kubernetes.io/last-applied-configuration":"...","author":"security-team","ticket":"SEC-5012"}

The security team applied these NetworkPolicies 45 minutes ago as part of a namespace hardening initiative (SEC-5012). They allowed the application port but forgot the metrics port.

Domain Bridge: Why This Crossed Domains¶

Key insight: The symptom was empty Grafana dashboards (observability), the root cause was a NetworkPolicy blocking Prometheus scrapes (kubernetes_ops applied by the security team), and the fix requires updating the NetworkPolicy to allow metrics port traffic (networking). This is common because: observability depends on network connectivity between the monitoring stack and the workloads. NetworkPolicies that only allow application traffic break the metrics pipeline. This is a frequent blind spot in namespace hardening initiatives.

Root Cause¶

A security hardening initiative applied a default-deny NetworkPolicy to the payments namespace with an allow rule only for port 8080 (application traffic). Port 9090 (Prometheus metrics) was not included in the allow list. Prometheus in the monitoring namespace could no longer scrape metrics from payments pods, causing all targets to appear as DOWN and Grafana dashboards to show "No data."