| Identified misleading symptom |
Distinguished between token expiry and certificate expiry; identified the X.509 wrapping cert as the issue |
Found the expired certificate but took time to understand the X.509/JWKS relationship |
Investigated Keycloak TLS cert, OIDC discovery, or token claims |
| Found root cause in kubernetes domain |
Traced to Keycloak's PASSIVE key with expired wrapping certificate; understood ACTIVE/PASSIVE key lifecycle |
Found the expired key but not why it affected only 34% of users |
Assumed Keycloak was misconfigured or the key rotation was broken |
| Remediated in cloud domain |
Regenerated the certificate via KMS, set up automated renewal via Lambda |
Renewed the certificate but did not automate future renewals |
Manually rotated the key or disabled v1 entirely (breaking existing tokens) |
| Cross-domain thinking |
Explained the full chain: KMS cert TTL -> certificate expiry -> X.509 validation failure -> intermittent auth failures |
Acknowledged the multi-layer cert/key relationship but missed the KMS automation gap |
Treated it as a single-domain OIDC or certificate management issue |