Choosing Between Aurora Replicas, RDS Read Replicas, and DMS

October 07, 2026 · 16 min read

Solutions Architect · SAA-C03 · part of The Exam Room

The situation

Three databases, three read-scale problems.

  • Aurora PostgreSQL production OLTP, the core transactional store for the SaaS. Writer: db.r6g.2xlarge. Read traffic is currently mixed in with writes on the writer. The reporting team’s Tableau dashboards hit the writer directly with heavyweight queries during business hours. Writer CPU sits at 90% during the afternoon, and the occasional query-from-hell causes five-minute stalls. Need: offload analytical reads without changing the application’s connection strings drastically.
  • RDS MySQL production, legacy shop, 8 TB database, used by an older application that can’t easily change its data-access patterns. Nightly batch jobs read entire tables for offline analytics; these jobs hold consistent-read transactions for hours and cause replication lag to creep upward on downstream systems. Need: move the batch reads off the writer without restructuring the batch code.
  • Aurora PostgreSQL multi-Region, a newer service with a regulatory requirement that analytics in Sydney happen against infrastructure physically in Sydney. The writer is in eu-central-1 for product-team proximity. Cross-Region replication lag of minutes is acceptable; the reports are day-scale, not real-time. Need: a readable copy in ap-southeast-2 with sub-60-second replication lag.

Each needs “read-only copy”. But “read-only copy” can mean different things.

What actually matters

Before comparing replication options it’s worth asking what each workload actually needs from a replica.

The Aurora OLTP workload needs low replication lag (seconds, ideally milliseconds) because reporting queries run against recent transactional data, a dashboard showing “orders in the last hour” doesn’t tolerate a replica that’s an hour behind. It also needs same-engine semantics so the reporting team’s queries run unmodified. Aurora’s native replicas, sharing the same distributed storage as the writer, deliver both: replica lag is typically sub-20ms because replicas read from the same storage pages as the writer.

The RDS MySQL workload needs query isolation more than freshness. The batch jobs can tolerate a few minutes of lag; what they can’t tolerate is locking the writer. RDS read replicas (binlog-based asynchronous replication) provide enough isolation: the replica runs on its own EBS volume, its own instance, and its own buffer pool. Batch jobs run there and the writer is untouched.

The multi-Region Aurora workload needs cross-Region replication with sub-minute lag. Aurora Global Database is the service name for this. It replicates the storage layer across Regions using a purpose-built replication fabric, with typical replication lag under 1 second and a 99.9% SLA around 1 second. Writable in the primary Region; read-only in the secondary Regions until failover.

DMS. Database Migration Service, is adjacent. DMS does logical replication between databases (even between different engines), reads binlogs or WAL, and applies to the target. It’s useful for heterogeneous replication (MySQL → PostgreSQL, Oracle → Aurora), for migrations, and occasionally for continuous replication between systems that can’t use native replication. For same-engine replicas, DMS is almost always the wrong answer; native replication is faster, cheaper, and simpler.

Second question: how far apart are writer and reader, logically and physically?. Same cluster, different instance? Aurora replica or RDS read replica. Different Region? Aurora Global Database, or cross-Region read replica (RDS), or DMS. Different engine entirely? DMS.

Third: writable at the replica?. Aurora replicas, RDS read replicas, and Aurora Global Database secondaries are read-only (with the exception of Aurora Global secondaries which can be promoted to writers during failover). DMS targets are writable by design. DMS is a one-way pipe, and the target database is a normal database you can write to outside of the DMS-managed tables.

Fourth: schema changes. Aurora replicas and Aurora Global Database replicate schema changes automatically (they share storage or a tightly-coupled replication stream). RDS read replicas replicate DDL too, with the caveat that some schema operations briefly block the binlog stream. DMS does not replicate schema changes by default, you have to handle schema evolution on the target separately, and DMS can get confused by unexpected schema on either side.

Fifth: replica count and failover promotion. Aurora supports up to 15 read replicas per cluster with sub-second promotion; RDS supports up to 15 read replicas (MySQL, PostgreSQL) with promotion via a manual API call that takes minutes. Aurora Global Database supports a secondary per Region; one secondary can be promoted to primary in the event of a Regional failure.

Sixth: cost. Aurora replicas cost roughly the same per instance as the writer; Aurora shares storage, so no duplicate storage cost. RDS read replicas cost per instance plus their own EBS volume (so storage doubles). Aurora Global Database has a per-Region pricing uplift plus cross-Region data transfer. DMS has a replication instance cost plus the source and target DB costs, and often a higher network bill if it runs at scale.

What we’ll filter on

  1. Replication mechanism, shared storage, binlog/WAL, logical replication via DMS?
  2. Replication lag, sub-second, sub-minute, or multi-minute?
  3. Same engine or heterogeneous, can source and target be different engines?
  4. Multi-Region, is cross-Region reading supported?
  5. Failover and promotion behaviour, how long, how automated?
  6. Cost shape, per-instance, per-TB, or per-DMS-hour?

The replication landscape

  1. Aurora Replica. Aurora-specific. Replicas share the writer’s distributed storage. Replication lag typically under 20 ms because replicas read from the same storage pages. Supports up to 15 per cluster, each on its own instance class (can be smaller than writer). Integrates with the cluster’s reader endpoint (load-balanced across replicas) and custom endpoints (named subsets of replicas for workload isolation). Failover to a replica completes in seconds.

  2. RDS Read Replica. Classic engines (MySQL, MariaDB, PostgreSQL, Oracle). Replication is via the engine’s native asynchronous mechanism (MySQL binlog, PostgreSQL streaming replication). Each replica is an independent instance with its own storage; storage cost duplicates. Replication lag seconds to minutes depending on writer load and replica class. Up to 15 per primary (MySQL/PostgreSQL). Read replicas can be cross-Region.

  3. Aurora Global Database. Multi-Region Aurora. One primary Region (writable), up to 5 secondary Regions (read-only). Replication via a purpose-built storage-level fabric, typically sub-second lag across Regions with a 99.9% SLA at 1 second. Each secondary Region can be promoted to primary in minutes during a Regional failover. Each secondary Region has its own cluster (writer + optional replicas). Cost includes a cross-Region replication fee.

  4. DMS continuous replication. Change Data Capture from source to target. Source and target can be the same engine (MySQL → MySQL), different engines (PostgreSQL → Aurora MySQL), or heterogeneous (Oracle → PostgreSQL, SQL Server → Aurora). Replication runs on a DMS replication instance (EC2-class pricing); lag depends on workload and can drift under high write rates. Schema conversion via AWS Schema Conversion Tool (SCT) is a separate step when engines differ.

  5. Logical replication at the engine level. PostgreSQL’s pglogical, MySQL’s binlog-based GTID replication to external consumers, or third-party tools like Debezium. These bypass AWS-managed replication entirely and hand the data to downstream systems. Useful for streaming to Kafka, warehouses, or other analytics stores; not the answer for “I want a read replica of my database”.

  6. Aurora Zero-ETL integration. Newer: Aurora writer → Redshift or OpenSearch without DMS. Changes stream continuously to the analytics target. Not a read replica in the traditional sense, but serves the same “offload analytics” role when the target is a data warehouse rather than a SQL read-replica.

Side by side

Option Mechanism Lag Cross-engine Cross-Region Cost
Aurora Replica shared storage sub-20ms instance only (no extra storage)
RDS Read Replica binlog / WAL streaming seconds instance + storage
Aurora Global Database dedicated fabric sub-1s instance + cross-Region fee
DMS logical CDC seconds to minutes DMS instance + network + target
Zero-ETL managed stream seconds to minutes ✓ (to warehouse) data transfer + target

Reading the table by workload:

  • Aurora OLTP with reporting workload: Aurora Replicas (2 of them, on a dedicated custom endpoint for reporting). Sub-20ms lag, no extra storage cost, reader endpoint route. Can use smaller instances than the writer for the reporting traffic.
  • RDS MySQL with batch jobs: RDS read replica. Replicates via binlog, independent instance and storage. Batch jobs point at the replica endpoint; writer stays isolated.
  • Aurora multi-Region: Aurora Global Database with a secondary in ap-southeast-2. Sub-1s cross-Region lag, Sydney reads hit local infrastructure, Frankfurt keeps the write path.

The replication mechanism diagram

Aurora Replica, shared storage replicas read from the same storage pages as the writer Writer r6g.2xlarge Replica 1 reporting endpoint Replica 2 reader endpoint Replica 3 failover target Aurora distributed storage (6 copies across 3 AZs) lag < 20ms RDS Read Replica, binlog / WAL streaming replica is an independent instance with its own storage RDS MySQL primary EBS (gp3) 8 TB binlog enabled binlog replication (async) RDS MySQL replica EBS (gp3) 8 TB batch endpoint lag ~1-10s storage is 2× the primary Aurora Global Database, dedicated cross-Region fabric storage-level replication; secondary cluster is read-only until promoted Primary cluster (eu-central-1) writer + replicas local Aurora storage read + write cross-Region replication fabric typical lag < 1s Secondary cluster (ap-southeast-2) readers only (until failover) local Aurora storage read only
Three replication mechanisms, three different places where the copy lives. Aurora replicas share storage; RDS replicas are independent instances with their own volumes; Global Database uses a purpose-built cross-Region fabric.

The picks in depth

Aurora OLTP → two Aurora replicas on a custom endpoint for reporting. The reporting team gets a dedicated endpoint that routes to the two replicas, isolating them from the general reader endpoint which the application uses. If a runaway Tableau query pins one replica, the other absorbs the load and the application’s reader endpoint is unaffected.

aws rds create-db-cluster-endpoint \
    --db-cluster-identifier prod-aurora \
    --db-cluster-endpoint-identifier reporting \
    --endpoint-type READER \
    --static-members prod-aurora-replica-1 prod-aurora-replica-2

Connection string changes for the reporting team: from prod-aurora.cluster-xxx.eu-central-1.rds.amazonaws.com to reporting.cluster-custom-xxx.eu-central-1.rds.amazonaws.com. No schema changes, no data movement; the replicas are backed by the same storage as the writer. Writer CPU drops from 90% to ~40% the day the reporting team cuts over.

RDS MySQL → one cross-AZ read replica for batch. Created with CreateDBInstanceReadReplica:

aws rds create-db-instance-read-replica \
    --db-instance-identifier prod-mysql-batch-replica \
    --source-db-instance-identifier prod-mysql \
    --db-instance-class db.r6g.2xlarge \
    --availability-zone eu-central-1b \
    --publicly-accessible false

The replica carries its own EBS volume that replicates the 8 TB primary; initial snapshot restore takes a couple of hours, then binlog streaming catches up. Replication lag settles to ~1-5 seconds under the primary’s write load. Batch jobs point at prod-mysql-batch-replica.xxx.eu-central-1.rds.amazonaws.com; the primary sees no batch traffic at all.

Monitoring: ReplicaLag CloudWatch metric with an alarm at 60 seconds. If lag creeps up, it usually means the replica’s instance class or storage IOPS is undersized; resize as needed without affecting the primary.

Aurora multi-Region → Global Database with Sydney secondary. Adding a secondary Region to an existing Aurora cluster:

aws rds create-global-cluster \
    --global-cluster-identifier prod-global \
    --source-db-cluster-identifier prod-aurora
    
aws rds create-db-cluster \
    --db-cluster-identifier prod-aurora-apse2 \
    --global-cluster-identifier prod-global \
    --engine aurora-postgresql \
    --source-region eu-central-1

aws rds create-db-instance \
    --db-instance-identifier prod-aurora-apse2-1 \
    --db-cluster-identifier prod-aurora-apse2 \
    --db-instance-class db.r6g.2xlarge \
    --engine aurora-postgresql

First command creates the Global Cluster wrapper; second adds a secondary cluster in ap-southeast-2; third adds a reader instance to the secondary cluster. Sydney clients connect to prod-aurora-apse2.cluster-ro-xxx.ap-southeast-2.rds.amazonaws.com for reads. Writes continue to go to eu-central-1.

Lag is typically under a second; measurable via AuroraGlobalDBReplicationLag in CloudWatch. In a Regional failover scenario, FailoverGlobalCluster promotes the Sydney cluster to primary; the eu-central-1 cluster becomes a secondary. Failover time is on the order of minutes.

When DMS is the correct answer

DMS is the wrong tool for same-engine read replicas, but the correct tool for:

  1. Cross-engine replication. Migrating Oracle to PostgreSQL while both run in parallel during cutover; replicating MySQL to Aurora MySQL (where native replication still works but DMS provides migration-specific tooling); continuous CDC from SQL Server to Redshift for analytics.
  2. One-way pipes into analytics targets. Source is MySQL; target is Redshift. Native replication isn’t an option (different engines), and Zero-ETL may not be available for the specific combination. DMS with CDC keeps the analytics store within a few minutes of the source.
  3. Selective replication. Replicate only certain tables or apply filtering rules during replication. Native replication is all-or-nothing on a schema; DMS replication tasks can be scoped.

For same-engine same-family workloads, DMS adds cost, latency, and complexity with no benefit over native replication. When someone suggests DMS for a same-engine read replica, the correct question is “why not just use the native replication path?”, the answer should be compelling.

A worked trace: reporting query on the Aurora replica

Ravi writes a heavy reporting query and runs it against the new reporting endpoint. The writer’s CPU dashboard stays flat; the replica’s CPU briefly spikes to 80%.

-- Connected to: reporting.cluster-custom-xxx.eu-central-1.rds.amazonaws.com
SELECT date_trunc('hour', created_at) AS hour,
       COUNT(*) AS orders,
       SUM(total_cents) / 100.0 AS revenue
  FROM orders
 WHERE created_at >= NOW() - INTERVAL '7 days'
 GROUP BY 1
 ORDER BY 1;

-- 168 rows
-- Execution time: 4.2 seconds
CloudWatch (eu-central-1):
  prod-aurora (writer):          CPU 42% (stable)
  prod-aurora-replica-1:         CPU 78% (briefly)
  prod-aurora-replica-2:         CPU 5%
  AuroraReplicaLag (replica-1):  12 ms
  AuroraReplicaLag (replica-2):  9 ms

Writer unaffected; replica-1 absorbs the query; replica-2 available for the next one. Replication lag stays comfortably under 20 ms because both replicas are reading from the same storage pages as the writer and don’t have a separate replay process to catch up.

If Ravi had run the same query against the old setup, direct to the writer, the writer’s CPU would have jumped and every user-facing query during those 4.2 seconds would have been affected.

What’s worth remembering

  1. Aurora replicas share storage with the writer. Sub-20ms lag, no duplicated storage cost, up to 15 replicas per cluster. Use custom endpoints to isolate workloads across replicas.
  2. RDS read replicas are independent instances with their own storage. Binlog or WAL streaming; lag is workload-dependent but typically seconds. Storage cost duplicates; worth it for the isolation.
  3. Aurora Global Database is the multi-Region answer. Sub-second typical lag via a dedicated fabric. Secondaries are read-only; promotion to primary is minutes.
  4. DMS is for heterogeneous replication. Cross-engine, cross-platform, migration scenarios. Same-engine same-family is always cheaper and simpler with native replication.
  5. Writer is protected by moving reads away. The real win is not the replica’s raw performance; it’s that the writer’s CPU, buffer pool, and query planner aren’t contending with analytical queries.
  6. Replication lag is the metric to watch. AuroraReplicaLag (Aurora), ReplicaLag (RDS), AuroraGlobalDBReplicationLag (Global). Alarm early; lag that grows unbounded means the replica is undersized or the writer is overwhelming its replication pipe.
  7. Custom endpoints isolate replica workloads. A reporting endpoint with its own replicas means a runaway Tableau query can’t affect the application’s reader endpoint. Cheap insurance.
  8. Zero-ETL skips the replica entirely for warehouse targets. If the goal is analytics-in-Redshift rather than read-replica-for-Tableau, Aurora → Redshift Zero-ETL is the newer, simpler answer, no DMS, no self-managed pipeline.

Three read-scale problems, three replication mechanisms. Aurora replicas for in-cluster offload with sub-20ms lag; RDS read replicas for independent-instance isolation at the cost of duplicated storage; Aurora Global Database for cross-Region reads with sub-second lag. DMS sits alongside all three for the heterogeneous cases where none of the native paths applies.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.