Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation¶
Domains: security | kubernetes_ops | cloud Level: L3 Estimated time: 45 min
Initial Alert¶
Application monitoring fires at 07:55 UTC:
CRITICAL: auth_failures_rate > 10%
service: auth-gateway
namespace: prod
failure_type: token_validation_error
rate: 34% of login attempts failing
message: "OIDC token validation failed: unable to verify signature"
Follow-up:
CRITICAL: user-service — 401 Unauthorized rate spike to 28%
WARNING: Customer support ticket volume spike — "cannot log in" reports
CRITICAL: SLO — auth success rate 66% (target: 99.95%)
Observable Symptoms¶
- 34% of user login attempts fail with "unable to verify signature" error.
- The failures are intermittent — some logins succeed, others fail.
- The auth-gateway logs show
OIDC token validation failed: x509: certificate has expired or is not yet valid. - The OIDC provider (Keycloak, running in the cluster) is reachable and issuing tokens.
- Keycloak's login page works — users can enter credentials. The failure happens when the auth-gateway validates the returned JWT.
- The auth-gateway was last deployed 2 weeks ago. No configuration changes.
The Misleading Signal¶
Authentication failures with "certificate expired" in an OIDC context look like a security/identity provider problem. Engineers investigate the Keycloak TLS certificate, OIDC discovery endpoint, JWKS (JSON Web Key Set) endpoint, and token signing keys. The intermittent nature (66% success rate) suggests a race condition or a partial configuration issue in the identity stack. The security team begins auditing Keycloak's realm configuration and key rotation settings.