Diagnostic Questions¶

Before revealing the investigation path:¶

You receive 47 alerts in 3 minutes. All are KubePodNotReady firing and resolving for the same service. Is the right first step to tune Alertmanager to suppress the noise, or to investigate why the pods are flapping?
The health endpoint responds in 0.8-1.2 seconds and the probe timeout is 1 second. The health check calls an Elasticsearch dependency. Is the application broken, or is the probe misconfigured?
DNS resolution for catalog-search.prod takes 723ms from the pod namespace but only 45ms from the monitoring namespace. What could cause namespace-specific DNS latency?
The platform team added extra search domains to pod DNS config. With ndots:5, how many DNS queries does a lookup for catalog-search.prod generate? Why does this cause intermittent probe failures?
The liveness probe checks dependency health (database, cache, Elasticsearch). Why is this a design mistake? What is the correct split between liveness and readiness probes?