Skip to content

Progressive Hints

Hint 1 (after 5 min)

Look at the Prometheus rule evaluation metrics: prometheus_rule_group_last_duration_seconds and prometheus_rule_group_last_evaluation_timestamp_seconds are both 0 for the api-server.rules group. This means Prometheus is not evaluating any alerting rules for the API server — they are not loaded.

Hint 2 (after 10 min)

The Prometheus log says "No rule files found matching pattern /etc/prometheus/rules/*.yaml". The Helm values configure rule_files to load from /etc/prometheus/rules/*.yaml. But the comment in the config says "rule_files key was moved during a config refactor on Dec 1." The rules files may exist at a different path, or the rule_files directive may be in the wrong YAML location (under serverFiles.prometheus.yml instead of the expected Helm chart structure).

Hint 3 (after 15 min)

This is a monitoring stack (Prometheus + Alertmanager + Grafana) watching an API server cluster. Scraping is working fine — up returns 1 for all 3 instances. But on Dec 1, someone refactored the Prometheus Helm values and moved the rule_files configuration. The path /etc/prometheus/rules/*.yaml does not match where the Helm chart actually mounts rule files (likely /etc/config/rules/ or similar). Prometheus loads successfully but finds no matching rule files, so zero alerting rules are evaluated. The Alertmanager is healthy but has nothing to alert on. Grafana dashboards show data (scraping works) but may show flat lines where alert-derived metrics existed. The scrape interval was also changed to 5 minutes (from a presumably shorter interval), which could cause Grafana to show sparse data.