Comparison: Relational Databases¶
Category: Storage Last meaningful update consideration: 2026-03 Verdict (opinionated): PostgreSQL unless Aurora's auto-scaling storage model or MySQL read replica performance specifically fits your workload. PostgreSQL is the safer, more capable default.
Quick Decision Matrix¶
| Factor | PostgreSQL | MySQL | Aurora (PostgreSQL/MySQL compatible) |
|---|---|---|---|
| Learning curve | Medium | Low-Medium | Medium (Aurora-specific behaviors) |
| Operational overhead | Medium (self-hosted) / Low (RDS) | Medium (self-hosted) / Low (RDS) | Low (AWS-managed) |
| Cost at small scale | Free (self-hosted) / RDS pricing | Free (self-hosted) / RDS pricing | Higher than RDS (Aurora I/O costs) |
| Cost at large scale | Medium | Medium | Variable (can be cheaper or much more expensive) |
| Community/ecosystem | Massive | Massive | AWS-only |
| Hiring | Easy | Easy | AWS + DB skills |
| SQL compliance | Excellent (most standards-compliant) | Good (with quirks) | Inherits from PG or MySQL |
| JSON support | Excellent (JSONB, indexing, operators) | Basic (JSON type, limited indexing) | Inherits |
| Extensions | Rich (PostGIS, pg_stat_statements, TimescaleDB, etc.) | Limited (plugins, UDFs) | Limited subset of PG/MySQL |
| Replication | Streaming replication, logical replication | Native replication (async, semi-sync, group) | Built-in (6-way storage replication) |
| Connection pooling | External (PgBouncer) | Built-in (thread per connection) | Proxy (RDS Proxy) |
| MVCC model | True MVCC (dead tuples, VACUUM) | MVCC (InnoDB, undo logs) | Storage-level MVCC |
| Partitioning | Declarative (PG 10+) | Native (range, list, hash) | Inherits |
| Full-text search | Built-in (tsvector, GIN index) | Built-in (MyISAM, InnoDB) | Inherits |
When to Pick Each¶
Pick PostgreSQL when:¶
- You want the most capable, standards-compliant relational database
- Your workload uses advanced SQL features: CTEs, window functions, array types, JSONB
- You need extensions: PostGIS for geospatial, TimescaleDB for time-series, pg_cron for scheduling
- Your team values data integrity — PostgreSQL's constraint enforcement is strictest
- You want the option to run on any cloud or on-premises
- The default choice for new projects unless there is a specific reason not to
Pick MySQL when:¶
- Your application framework has better MySQL support or optimization (some PHP and Ruby frameworks historically)
- Read-heavy workloads where MySQL's read replica performance is well-tuned
- Your team has deep MySQL expertise and operational runbooks
- You are running WordPress, Drupal, or other MySQL-centric platforms
- Simpler operational model is preferred (MySQL has fewer knobs to turn)
Pick Aurora when:¶
- You are AWS-only and want auto-scaling storage (no pre-provisioning disk)
- You need fast failover (Aurora's storage-level replication enables sub-30-second failover)
- Read replica performance matters — Aurora replicas share storage and have minimal replication lag
- Your workload is bursty and Aurora Serverless v2 scaling fits your access pattern
- You need cross-region replication (Aurora Global Database)
Nobody Tells You¶
PostgreSQL¶
- VACUUM is PostgreSQL's dirty secret. Dead tuples from UPDATE/DELETE accumulate and must be cleaned up. Autovacuum handles this but requires tuning for write-heavy tables. A blocked autovacuum leads to table bloat, then slow queries, then transaction ID wraparound — which is a database emergency.
- Connection management is PostgreSQL's weakness. Each connection spawns a process (~10MB RAM). Without PgBouncer or PgPool, 500 connections consume 5GB of RAM just for processes. Application-side connection pooling helps but does not eliminate the need.
pg_stat_statementsis not enabled by default. Enable it immediately — it is the single most important extension for understanding query performance.- PostgreSQL's EXPLAIN output is detailed but requires expertise to interpret. The "Seq Scan on large_table" pattern is the most common performance problem and is fixed by adding appropriate indexes.
- Logical replication (PG 10+) enables real-time replication of specific tables to different databases. This is powerful for data warehouse feeding but has gotchas: DDL changes do not replicate, and schema drift between publisher and subscriber causes silent failures.
- PostgreSQL upgrades (major versions) require either
pg_upgrade(fast but downtime) or logical replication (zero-downtime but complex). There is no magic button. - Extensions run in the database process. A buggy or malicious extension can corrupt your database. Be selective about which extensions you install.
MySQL¶
- MySQL has two storage engines that matter: InnoDB (default, transactional, ACID) and MyISAM (legacy, non-transactional, deprecated). If you see MyISAM tables in production, migrate them immediately.
- MySQL's
utf8character set is NOT actual UTF-8. It is a 3-byte subset that cannot store emoji or many CJK characters. Useutf8mb4for real UTF-8. This catches people every time. - MySQL's GROUP BY behavior is non-standard by default. Columns in SELECT that are not in GROUP BY and not aggregated return indeterminate values. Enable
ONLY_FULL_GROUP_BYSQL mode to get standard behavior. - MySQL replication is simpler than PostgreSQL's but has its own issues. Statement-based replication (SBR) can produce different results on replicas for non-deterministic functions. Row-based replication (RBR) is safer but produces larger binary logs.
- The MySQL ecosystem (Percona XtraBackup, ProxySQL, Orchestrator) is mature but separate from the MySQL core. You assemble your own toolchain rather than using built-in features.
- MySQL's InnoDB buffer pool is the most important tuning parameter. Set it to 70-80% of available RAM and most performance problems go away.
- EXPLAIN output in MySQL is simpler than PostgreSQL's but also less informative. Complex query plans require EXPLAIN ANALYZE (MySQL 8.0.18+).
Aurora¶
- Aurora pricing is confusing. You pay for compute (instance hours), storage (per GB/month), I/O (per million I/O operations), and data transfer. The I/O cost is the wildcard — write-heavy workloads can have I/O costs exceeding compute costs.
- Aurora I/O-Optimized (released 2023) eliminates per-I/O charges in exchange for higher storage and compute costs. For write-heavy workloads, this can cut costs 40%+. But you must opt in — the default pricing model penalizes writes.
- Aurora is "PostgreSQL/MySQL compatible" but not identical. Some PostgreSQL extensions are not available, some MySQL features behave differently, and Aurora-specific behaviors can surprise you.
- Aurora Serverless v2 scales compute automatically but has minimum capacity settings. If your minimum is too low, cold start queries are slow. If it is too high, you pay for idle capacity.
- Aurora Global Database (cross-region replication) has replication lag — typically under 1 second but can spike under high write load. Applications reading from global secondary regions must tolerate eventual consistency.
- Migration from RDS PostgreSQL to Aurora PostgreSQL is straightforward (snapshot + restore), but migration from self-hosted PostgreSQL to Aurora requires more planning (version compatibility, extension support).
- Aurora's storage auto-scaling is great going up but does NOT auto-shrink. If your database grows to 2TB and you delete data, you still pay for 2TB until the storage is reclaimed (which is not automatic).
Migration Pain Assessment¶
| From → To | Effort | Risk | Timeline |
|---|---|---|---|
| MySQL → PostgreSQL | High | Medium-High | 2-6 months |
| PostgreSQL → MySQL | High | High | 3-6 months (losing features) |
| PostgreSQL → Aurora PG | Low-Medium | Low | 1-2 weeks |
| MySQL → Aurora MySQL | Low-Medium | Low | 1-2 weeks |
| Self-hosted → RDS | Medium | Low-Medium | 2-4 weeks |
| RDS → Aurora | Low | Low | Hours (snapshot restore) |
| Aurora → self-hosted | Medium-High | Medium | 1-2 months |
Database migrations are among the riskiest infrastructure changes. The schema transfers easily; the hard parts are application query compatibility, stored procedure translation, and ORM behavior differences. Always run both databases in parallel with dual-write and read-comparison testing.
The Interview Answer¶
"PostgreSQL is my default relational database because it is the most standards-compliant, has the richest extension ecosystem, and handles the widest range of workloads — OLTP, JSONB document storage, geospatial with PostGIS, and time-series with TimescaleDB. On AWS, Aurora PostgreSQL adds auto-scaling storage and fast failover at the cost of I/O-based pricing complexity. MySQL is perfectly fine for read-heavy web applications but PostgreSQL's superiority in SQL compliance, JSON support, and extensibility makes it the safer default for new projects. The key operational concern for PostgreSQL is connection management (use PgBouncer) and VACUUM tuning — these are the two things that bite production deployments."
Cross-References¶
- Topic Packs: PostgreSQL, Database Internals, Database Ops, MySQL Ops
- Related Comparisons: Caching, Messaging