Skip to content

The Auth System Swap

Category: The Migration Domains: security, identity Read time: ~5 min


Setting the Scene

I was the security-focused SRE at a healthcare scheduling platform — 45,000 daily active users, about 800 API requests per second. We were replacing our homegrown JWT authentication system with Auth0. The homegrown system had been built five years earlier by a backend developer who'd read a blog post about JWTs and implemented everything from scratch: token issuance, validation, refresh, and revocation. It worked, but it had no MFA support, no SSO, and the signing key was a static RSA key stored in a config file that hadn't been rotated since 2021.

The plan: stand up Auth0, migrate user accounts, update all services to validate Auth0 tokens, cut over. "Two sprints," the product manager said. It took five sprints and we broke login for 8,000 users along the way.

What Happened

Week 1-2 — Auth0 tenant setup. We configured the tenant, imported our 180,000 user records using Auth0's bulk import API. Passwords couldn't be migrated (bcrypt hashes with a custom salt scheme), so we configured Auth0's "custom database" connection to validate against our old user table on first login, then re-hash and store in Auth0. Clever, and it actually worked. Users would transparently migrate on their next login.

Week 3 — Token format mismatch. Our old system issued JWTs with { "user_id": 12345, "role": "admin" } in the payload. Auth0 issues JWTs with { "sub": "auth0|abc123", "permissions": ["read:patients"] }. Every service that parsed the JWT — and that was 14 services — expected user_id as an integer and role as a string. Auth0's sub claim is a string with a provider prefix. We had to add an Auth0 Rule (now Action) to inject custom claims, and update every service to check both formats.

Week 4-5 — The dual-token period. During migration, some users had old tokens (from sessions started before cutover) and some had new Auth0 tokens. Our API gateway needed to validate both. I wrote middleware that tried Auth0 JWKS validation first, fell back to the old RSA key validation, and mapped both token formats to a unified internal user context. It was 200 lines of careful code with a test suite I was very proud of.

Week 6 — Session invalidation incident. We flipped the login page to Auth0 on a Tuesday at 10 AM. New logins got Auth0 tokens. But 8,000 users had active sessions with old tokens — tokens that wouldn't expire for another 23 hours (our old system used 24-hour expiry, which in retrospect was way too long). Those users were fine until they tried to refresh. The refresh endpoint was now Auth0's, and it didn't recognize old refresh tokens. Users got silently logged out mid-session. Support tickets spiked. The phrase "I lost my patient schedule" appeared 340 times in Zendesk that day.

Week 7 — Emergency patch. I added a /legacy-refresh endpoint that accepted old refresh tokens, validated them against the old system, and issued new Auth0 tokens via the Management API. Users hitting this endpoint got transparently migrated without a logout. We deployed it at 2 AM Wednesday. By Thursday, the ticket volume was back to normal.

Week 8 — Old token sunset. We set a hard deadline: old tokens would stop being accepted on March 15th. Sent email notifications at 14 days, 7 days, and 1 day. On March 15th, I flipped the middleware to Auth0-only validation. 47 users got logged out. They called support. Support told them to log in again. Nobody filed a complaint. I removed the legacy validation code and the old RSA key from every service's config. Then I rotated the key one final time, just for closure.

The Moment of Truth

Tuesday at 2 PM, watching the Zendesk ticket counter climb as 8,000 users lost their sessions. We'd planned the login cutover but hadn't planned the session transition. The new system couldn't refresh old tokens, and we hadn't thought about the users who were already logged in and wouldn't see the new login page until their current session expired. Authentication migrations aren't just about new logins — they're about existing sessions.

The Aftermath

Auth0 gave us MFA, SSO with three enterprise clients, and passwordless login — all things that would have taken months to build in-house. The total user disruption was about 18 hours for the 8,000 affected users. Three months later, the old user database was decommissioned and that static RSA key was finally gone from every config file. The security auditor was happy for the first time in two years.

The Lessons

  1. Plan the token transition: Old tokens and new tokens will coexist. Your services need to validate both formats during migration. Write the dual-validation middleware before cutover day, not during.
  2. Support both token formats during migration: Map old claims to new claims in a compatibility layer. Don't force a flag day where every service must update simultaneously — that's how you get partial outages.
  3. Session invalidation is user-visible: When you change auth systems, every active session becomes a potential logout event. Plan for it: extend old token validity, provide a legacy refresh path, or notify users to re-authenticate proactively.

What I'd Do Differently

I'd implement the legacy refresh endpoint before the cutover, not after. On cutover day, users with old tokens would seamlessly get new Auth0 tokens on their next refresh. No logouts, no Zendesk tickets. I'd also reduce the old token expiry to 1 hour in the weeks before migration, so the window of "users with old tokens" would be 60 minutes instead of 24 hours. And I'd run a shadow mode for a week — route login requests to both systems, compare results, but only use the old system's response — to catch edge cases before they hit users.

The Quote

"We planned the login migration perfectly. We just forgot that 8,000 people were already logged in."

Cross-References