The Database Migration Weekend¶
Category: The Migration Domains: database-ops, postgresql Read time: ~5 min
Setting the Scene¶
I was the DBA-slash-SRE at a logistics company. We had a MySQL 5.7 database — 800GB, 340 tables, serving 15 applications. The decision to move to PostgreSQL 15 came from engineering leadership after a particularly painful encounter with MySQL's ONLY_FULL_GROUP_BY behavior and a growing desire for JSONB support and proper CTEs. We scheduled a 72-hour maintenance window over a holiday weekend. I printed out the migration runbook. It was 14 pages.
I'd tested the migration using pgloader on a 50GB subset. It took 4 hours. I extrapolated linearly and budgeted 16 hours for the full dataset, leaving 56 hours of padding. Should be plenty.
What Happened¶
Friday 6:00 PM — We put up the maintenance page, stopped application writes, and took a final mysqldump backup. I kicked off pgloader with our migration configuration file, loaded a beer, and watched the logs scroll.
Friday 11:00 PM — The main tables were migrating. 200GB in, no errors. I went to sleep on the office couch with my laptop open to a terminal.
Saturday 8:00 AM — 580GB migrated. Then pgloader hit our audit_log table — 180 million rows with a MEDIUMBLOB column. MySQL MEDIUMBLOB maps to PostgreSQL BYTEA, but the rows contained mixed encodings. Some had Latin-1 text stored as binary. Some had actual binary data. pgloader choked on the encoding detection and started throwing invalid byte sequence for encoding "UTF8" errors.
Saturday 2:00 PM — I wrote a Python script to stream the audit_log table row by row, detect encoding per row with chardet, and convert to UTF-8 before inserting into PostgreSQL. It processed about 2,000 rows per second. With 180 million rows, that's 25 hours. We were at hour 20 of 72.
Saturday 10:00 PM — The audit log was still streaming. Meanwhile, I discovered that MySQL TINYINT(1) — which we used as booleans — had migrated as integers, not PostgreSQL BOOLEAN. Fourteen applications expected true/false. They'd be getting 0 and 1. I wrote an ALTER TABLE script to convert 47 columns across 23 tables. Each ALTER on a large table took 10-40 minutes because PostgreSQL rewrites the table.
Sunday 6:00 AM — Audit log finally finished. Boolean columns converted. I started running the application test suites. Three applications failed because they used MySQL-specific DATE_FORMAT() calls that didn't exist in PostgreSQL. We had missed these in code review because they were in stored procedures that pgloader had skipped entirely.
Sunday 4:00 PM — Eight hours before our window closed. We rewrote 12 stored procedures, re-ran test suites, and fixed a permissions issue where our app user didn't have USAGE on sequences (MySQL auto-increment just works; PostgreSQL sequences need explicit grants).
Sunday 11:30 PM — All 15 applications passing tests. We opened traffic. I stayed up until 3 AM watching error rates. Clean.
The Moment of Truth¶
Saturday at 2 PM, staring at a 180-million-row table with mixed encodings, realizing the "linear extrapolation" from a clean 50GB subset was worthless because the subset didn't include the pathological data. The test migration succeeded because it tested the easy data. The hard data was in the long tail.
The Aftermath¶
PostgreSQL ran beautifully. Query performance improved 30% on complex joins. JSONB support unblocked two feature teams. But we spent two more weeks finding MySQL-isms in application code — IFNULL vs COALESCE, backtick quoting, LIMIT offset syntax differences. Each one was a small fire. I kept a spreadsheet. There were 34.
The Lessons¶
- Do a full trial migration first: A subset test is not a migration test. You need the full dataset, including the ugly tables with mixed encodings and legacy data.
- Data type mapping is not trivial:
TINYINT(1)toBOOLEAN,MEDIUMBLOBtoBYTEA,DATETIMEtoTIMESTAMP— each mapping has edge cases. Audit every column type before migration day. - Always pad your maintenance window: We used 71.5 of our 72 hours. If we'd budgeted 48, we would have had to roll back. Budget for the worst table, not the average table.
What I'd Do Differently¶
I'd run pgloader against the full production dataset in a throwaway PostgreSQL instance two weeks before the real migration. Every encoding error, type mismatch, and stored procedure incompatibility would surface then, not during the window. I'd also build a compatibility test suite that runs every application's queries against both databases in parallel using something like pg_chameleon for dual-write validation.
The Quote¶
"The test migration succeeded because it only tested the data that was easy to migrate."
Cross-References¶
- Topic Packs: PostgreSQL, Database Ops, Database Internals