Diagnostic Questions¶
Before revealing the investigation path:¶
-
The Envoy sidecar shows "no healthy upstream" for inventory-service, but direct pod-to-pod calls (bypassing the mesh) work fine. What does this tell you about where the problem is — the network, the service, or the mesh control plane?
-
istioctl proxy-statusshows EDS asSTALEfor the order-service proxy. What does stale EDS mean, and what could prevent Istiod from pushing endpoint updates? -
Istiod logs show "forbidden: cannot list endpoints in namespace prod." What Kubernetes resource controls this permission, and how would you check if it exists and is correctly configured?
-
The ClusterRole was deleted by an automated RBAC cleanup controller. Why is the fix a security domain change (exclude system roles from cleanup) rather than just recreating the role (kubernetes) or fixing the mesh (networking)?
-
How would you design a safe RBAC cleanup process that hardens security without breaking system components like Istio? What safeguards and exclusion mechanisms would you use?