Skip to content

Symptoms: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation

Domains: security | kubernetes_ops | cloud Level: L3 Estimated time: 45 min

Initial Alert

Application monitoring fires at 07:55 UTC:

CRITICAL: auth_failures_rate > 10%
  service: auth-gateway
  namespace: prod
  failure_type: token_validation_error
  rate: 34% of login attempts failing
  message: "OIDC token validation failed: unable to verify signature"

Follow-up:

CRITICAL: user-service — 401 Unauthorized rate spike to 28%
WARNING: Customer support ticket volume spike — "cannot log in" reports
CRITICAL: SLO — auth success rate 66% (target: 99.95%)

Observable Symptoms

  • 34% of user login attempts fail with "unable to verify signature" error.
  • The failures are intermittent — some logins succeed, others fail.
  • The auth-gateway logs show OIDC token validation failed: x509: certificate has expired or is not yet valid.
  • The OIDC provider (Keycloak, running in the cluster) is reachable and issuing tokens.
  • Keycloak's login page works — users can enter credentials. The failure happens when the auth-gateway validates the returned JWT.
  • The auth-gateway was last deployed 2 weeks ago. No configuration changes.

The Misleading Signal

Authentication failures with "certificate expired" in an OIDC context look like a security/identity provider problem. Engineers investigate the Keycloak TLS certificate, OIDC discovery endpoint, JWKS (JSON Web Key Set) endpoint, and token signing keys. The intermittent nature (66% success rate) suggests a race condition or a partial configuration issue in the identity stack. The security team begins auditing Keycloak's realm configuration and key rotation settings.