Investigation: User Auth Failing, OIDC Cert Expired, Fix Is Cloud KMS Rotation¶
Phase 1: Security Investigation (Dead End)¶
Check the Keycloak OIDC discovery endpoint:
$ curl -s https://auth.example.com/realms/prod/.well-known/openid-configuration | jq '.jwks_uri'
"https://auth.example.com/realms/prod/protocol/openid-connect/certs"
$ curl -s https://auth.example.com/realms/prod/protocol/openid-connect/certs | jq '.keys | length'
2
Two keys in the JWKS. Check them:
$ curl -s https://auth.example.com/realms/prod/protocol/openid-connect/certs | jq '.keys[] | {kid, use, kty}'
{
"kid": "realm-key-v2",
"use": "sig",
"kty": "RSA"
}
{
"kid": "realm-key-v1",
"use": "sig",
"kty": "RSA"
}
Two signing keys: v1 and v2. This is normal during key rotation. Check if Keycloak is signing tokens with the correct key:
# Decode a recently issued JWT header
$ TOKEN=$(curl -s -X POST "https://auth.example.com/realms/prod/protocol/openid-connect/token" \
-d "client_id=test-client&grant_type=password&username=test&password=test123" | jq -r '.access_token')
$ echo $TOKEN | cut -d. -f1 | base64 -d 2>/dev/null | jq .
{
"alg": "RS256",
"typ": "JWT",
"kid": "realm-key-v2"
}
Tokens are signed with realm-key-v2. The auth-gateway should validate against the JWKS endpoint. Check the auth-gateway configuration:
$ kubectl get deployment auth-gateway -n prod -o yaml | grep -A5 "OIDC"
- name: OIDC_ISSUER_URL
value: "https://auth.example.com/realms/prod"
- name: OIDC_JWKS_CACHE_TTL
value: "3600"
The auth-gateway caches JWKS for 1 hour. The signing key rotation should be transparent — the gateway fetches the latest JWKS and validates against all available keys. Check the auth-gateway logs more carefully:
$ kubectl logs deploy/auth-gateway -n prod --tail=30 | grep "error\|fail\|cert"
2026-03-19T07:55:12Z ERROR token validation: x509: certificate has expired or is not yet valid
2026-03-19T07:55:12Z ERROR cert serial: 1A2B3C4D, issuer: CN=keycloak-signing-ca
2026-03-19T07:55:12Z ERROR not_after: 2026-03-18T23:59:59Z (expired 7h55m ago)
The error mentions keycloak-signing-ca — this is not the TLS certificate, this is the X.509 certificate wrapping the RSA signing key. In Keycloak, the signing key is wrapped in a self-signed X.509 certificate. That certificate has expired.
The Pivot¶
Check the Keycloak key certificates:
$ kubectl exec keycloak-0 -n auth -- /opt/keycloak/bin/kcadm.sh get keys -r prod 2>/dev/null | jq '.keys[] | {kid, status, certificate_expiry}'
{
"kid": "realm-key-v2",
"status": "ACTIVE",
"certificate_expiry": "2027-03-19T00:00:00Z"
}
{
"kid": "realm-key-v1",
"status": "PASSIVE",
"certificate_expiry": "2026-03-18T23:59:59Z"
}
realm-key-v1 expired yesterday. It is in PASSIVE state (used for validation of old tokens, not signing new ones). But 34% of validations are failing — that means 34% of requests carry tokens signed with v1.
Phase 2: Kubernetes Investigation (Root Cause)¶
Why do 34% of requests still have v1-signed tokens? Check the auth-gateway's JWKS cache:
$ kubectl exec deploy/auth-gateway -n prod -- curl -s http://localhost:8081/debug/jwks-cache | jq '.keys | length'
2
The auth-gateway has both keys cached. But it is doing X.509 certificate validation on the key certificates, not just signature verification. When v1's wrapping certificate expired, the gateway rejects tokens signed with v1 — even though the key itself is still mathematically valid.
Users with long-lived refresh tokens (issued before the key rotation) still get access tokens signed with v1. The token exp (expiry) has not passed, but the key's X.509 certificate has expired.
But why was the X.509 certificate set to expire? Check how the signing keys are generated:
$ kubectl get secret keycloak-signing-keys -n auth -o jsonpath='{.data.realm-key-v1\.pem}' | base64 -d | openssl x509 -noout -text | grep -E "Not Before|Not After|Issuer"
Issuer: CN = keycloak-signing-ca
Not Before: Mar 19 00:00:00 2025 GMT
Not After : Mar 18 23:59:59 2026 GMT
The key certificate has a 1-year validity, generated by AWS KMS. Keycloak was configured to use AWS KMS for key material with auto-generated wrapping certificates that have a 1-year TTL. When the certificate expires, the key is still valid in KMS but the X.509 wrapper is rejected by validators doing certificate chain validation.
Domain Bridge: Why This Crossed Domains¶
Key insight: The symptom was OIDC authentication failures (security), the root cause was an expired X.509 certificate wrapping the old signing key in Keycloak (kubernetes_ops), and the fix requires rotating the KMS key material and regenerating the wrapping certificate (cloud). This is common because: OIDC signing keys have multiple layers — the cryptographic key itself, the X.509 certificate wrapping it, and the JWKS endpoint serving it. Key rotation handles the signing key but may not handle the certificate wrapper. Cloud KMS adds another layer of abstraction.
Root Cause¶
The Keycloak OIDC signing key (realm-key-v1) was wrapped in a self-signed X.509 certificate with a 1-year validity period. The certificate expired, but the key was still in PASSIVE state for validating existing tokens. The auth-gateway performs X.509 certificate validation as part of JWT signature verification, rejecting any token signed with a key whose wrapping certificate has expired. 34% of users had tokens signed with the old key, causing intermittent authentication failures.