| Identified misleading symptom |
Checked Envoy config, noticed stale EDS, pivoted to Istiod logs within 10 min |
Investigated Envoy config and DestinationRules before finding the Istiod error |
Spent extended time on mesh configuration, VirtualServices, or mTLS settings |
| Found root cause in kubernetes domain |
Traced "forbidden" error to missing ClusterRole; found the RBAC cleanup controller as the deleter |
Found the missing ClusterRole but not why it was deleted |
Assumed Istiod needed a restart or the mesh was misconfigured |
| Remediated in security domain |
Restored role, configured cleanup controller exclusions, labeled all system roles |
Restored the role but did not prevent the cleanup controller from deleting it again |
Recreated the ClusterRole manually without addressing the root cause |
| Cross-domain thinking |
Explained the full chain: security automation -> RBAC deletion -> Istiod permission loss -> EDS stale -> Envoy 503s |
Acknowledged the RBAC/mesh dependency but missed the automation angle |
Treated it as a mesh networking problem |