The situation
A SaaS company runs its primary workload on Aurora PostgreSQL in us-east-1. The cluster holds 2 TB and sustains around 5,000 writes per second at peak. The board’s compliance and ransomware-resilience review has landed new requirements: a cross-region recoverable copy in eu-west-1, RPO under a minute (at most sixty seconds of writes lost in a regional disaster), RTO under an hour (from “declare failover” to “eu-west-1 accepting writes”), managed failover rather than a hand-driven runbook, and the secondary region usefully serving low-priority reads during normal operations so it isn’t a cold liability on the balance sheet.
The existing architecture is single-region; the question is which cross-region service to add and in what configuration.
What actually matters
The five bullets above are the contract. The interesting work is weighing what each one actually implies about the primitive underneath.
Sub-minute RPO under sustained load is tighter than it sounds. At 5,000 writes per second, a one-minute window is 300,000 writes. Whatever ships data across regions has to keep pace with that rate in real time, not catch up on a schedule. A backup job that runs hourly cannot produce sub-minute RPO by construction; even asynchronous log shipping with typical lag of “seconds” can open to minutes under a write spike. The primitive has to be one that doesn’t serialise replication through the database engine, because the engine’s own write rate competes with replication work for the same I/O budget.
Sub-hour RTO with a managed cutover is about the shape of the failure operation. “Under an hour” is generous until it isn’t, a runbook that goes “restore the latest snapshot, wait for it to finish, edit the application config, wire up IAM, rotate secrets, update DNS” is an hour on a good day and several on a bad one, because every step is human-timed. A single API call that atomically promotes a fully-current secondary is minutes. The correct primitive removes the restore step and the config-edit step entirely by having the secondary already be a live, queryable database.
A readable secondary matters commercially, not just architecturally. A warm-standby replica that the business already uses for analytics and reports pays back its cost every month; a cold one is overhead on the balance sheet that finance will eventually question. Whatever we choose should give the secondary region a role during normal operations, not only during disasters.
Operational simplicity is the quiet requirement. A hand-built CDC pipeline can meet every other bullet if we’re willing to build and maintain it, but we have to keep meeting them through schema changes, version upgrades, capacity growth, and team turnover. A managed primitive where AWS owns the replication plumbing is enormously cheaper across a few years of operation than one we own. The usual tradeoff is less flexibility; for a single Aurora-to-Aurora replication this flexibility isn’t worth paying for.
Scale headroom matters because 2 TB today is 4 TB in eighteen months. Anything that’s already groaning at 5K writes per second and 2 TB won’t be viable at twice that. Storage-layer replication that’s designed for Aurora’s internal throughput will absorb the growth; a CDC pipeline sized for today will need re-tuning every few quarters.
And there’s a sixth property nobody writes into a requirements doc but should: recovery after recovery. After a failover, the old primary eventually comes back. Does the primitive let us rejoin it cleanly as a new secondary, or is it a manual rebuild? The answer shapes how stressful the day after the incident looks.
What we’ll filter on
- Sub-minute RPO under sustained write load (thousands of writes per second).
- Sub-hour RTO via a single managed operation, not a human-driven runbook.
- Readable secondary that pays back cost during normal operations.
- Managed, not self-built, no CDC pipeline to own.
- Scales to multi-TB without per-quarter tuning.
- Clean rejoin after failover of the recovered primary.
The cross-region database replication landscape
-
Aurora Global Database. Purpose-built for exactly this problem. A single logical cluster spans multiple regions: one primary (read/write) and up to 10 secondary DB clusters, each in a different AWS Region, each able to host reader instances. Replication is at the storage layer, not through the database engine. Headline numbers: typical cross-region replication latency below 1 second, RPO ~1 second under normal operation, RTO in minutes for managed failover. Write-forwarding lets applications against a secondary transparently issue writes that land on the primary. Ticks every attribute with margins to spare.
-
Cross-region read replicas. An asynchronous, engine-level replica in a different region, promotable to a standalone writable instance. Lag is a function of write volume, engine, and network, typically seconds-to-minutes under sustained load. Promotion is manual via
promote-read-replica, after which the replica becomes a detached read/write cluster, no longer a secondary of anything. A one-way operation with a runbook-shaped cutover. -
AWS Database Migration Service (DMS) with ongoing CDC. The generalist continuous-replication tool. Shines when the target is a different engine or shape (Aurora to Snowflake, Aurora to S3). For homogeneous Aurora-to-Aurora it’s building by hand what Global Database ships as a product. Real operational overhead: replication instances to size and patch, tasks to author and maintain, schema changes to coordinate.
-
AWS Backup cross-region copy. A backup plan takes periodic snapshots and copies them to a vault in the destination region. Snapshot-based; no continuous replication. Minimum frequency is hourly. RPO equals the backup interval. Not readable between restores. The correct primitive for compliance or ransomware-resilience, with immutable vaults and cross-account isolation, adjacent requirement, often deployed alongside live replication.
Side by side
| Mechanism | RPO < 1 min | RTO < 1 hr managed | Readable secondary | Operationally managed | Scales to multi-TB | Clean rejoin |
|---|---|---|---|---|---|---|
| Aurora Global Database | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Cross-region read replica | — | ✗ | ✓ | — | ✓ | ✗ |
| DMS ongoing CDC | — | ✗ | ✓ | ✗ | ✓ | ✗ |
| AWS Backup cross-region | ✗ | ✗ | ✗ | ✓ | ✓ | — |
One row is all ticks: Aurora Global Database. Everything below is how the survivor actually works.
Matching requirements to mechanism
Aurora Global Database, in depth
Topology. A global cluster is the top-level object, identified by a global cluster ARN. Inside sit one primary Aurora DB cluster and up to 10 secondary Aurora DB clusters, each in a distinct AWS Region. Each cluster is a normal Aurora cluster. DB instances (writer and readers on the primary, readers only on secondaries), cluster and reader endpoints, independent parameter groups, security groups, and IAM roles. What’s new is replication between the primary’s storage volume and each secondary’s.
Storage-layer replication. In a normal Aurora cluster, the engine doesn’t write to local disk, it writes redo log records to a shared, six-way-replicated storage volume distributed across three AZs, and readers attach to the same volume. For Global Database, the primary’s storage volume asynchronously ships those same redo records across AWS’s global infrastructure to each secondary’s storage volume. The secondary’s storage applies the redo and presents the results to its readers.
The engine is out of the replication path. It doesn’t execute statements twice, doesn’t serialise through a replication process, doesn’t rely on binary logs competing with user queries for I/O. Replication is a storage primitive on dedicated servers. That’s what lets typical lag run below one second at write rates well above the scenario’s 5K/s. RPO ~1 second translates to two orders of magnitude better than the scenario’s 60-second requirement.
The secondary as a reader. Each secondary hosts its own reader instances, exposed via its own reader endpoint. Applications in eu-west-1 can query fresh data (sub-second stale on average) without the inter-region round trip. This is where the scenario’s “low-priority reads on the secondary” lives.
Write-forwarding. A secondary is read-only by default. Write-forwarding is opt-in: writes against a secondary endpoint are transparently forwarded to the primary, executed there, and replicated back, with the session resuming on the secondary once the change is visible locally.
For Aurora PostgreSQL, write-forwarding is per-session via apg_write_forward.consistency_mode, with four modes: SESSION (default when enabled) where queries see their own writes but may see stale reads relative to other sessions; EVENTUAL for lowest latency and weakest consistency; GLOBAL where queries wait until the secondary has caught up to all committed changes; OFF which disables it.
Supported in Aurora PostgreSQL 14.9+, 15.4+, and 16+. DDL is not forwarded, nor are COPY, VACUUM, TRUNCATE, cursors, SAVEPOINT, or SERIALIZABLE isolation.
Managed switchover (planned). The switchover-global-cluster API promotes a nominated secondary to primary in a controlled, data-loss-free operation. Aurora first ensures the target secondary has fully caught up, the primary stops accepting writes, replication drains, then the role swap happens. RPO 0 by construction. RTO minutes. Used for planned region rotations, operational maintenance, follow-the-sun architectures, DR drills.
Managed failover (unplanned). The failover-global-cluster API is for when the primary region is unreachable and waiting for replication to drain isn’t an option. The distinguishing flag is --allow-data-loss. RPO equals the unreplicated tail at the moment of failure, for healthy global clusters, typically a handful of seconds. RTO minutes.
Both commands operate against the global cluster identifier. That’s what makes them managed. Aurora orchestrates the role change, endpoint transitions, and cluster-state updates as one operation.
A worked failover trace
14:02 UTC on a Tuesday. us-east-1 is degraded; eu-west-1 is fine; the platform team has declared a regional failover.
T-∞, steady state. Primary greenbox-us in us-east-1, one writer and two readers. Secondary greenbox-eu in eu-west-1, two readers. Replication lag averages ~400 ms. Low-priority analytics already run against the eu-west-1 reader endpoint.
T+0, us-east-1 deemed unrecoverable. Incident commander authorises failover. failover-global-cluster --allow-data-loss runs against greenbox-global, naming the eu-west-1 cluster as the promotion target.
T+0 to T+several minutes, promotion. The eu-west-1 cluster transitions from secondary to primary. Its writer endpoint, unchanged DNS name, begins accepting writes. Unreplicated writes from the old primary are lost; at 5K writes/sec with typical sub-second lag, that’s on the order of a few thousand writes.
T+minutes, application cutover. The application’s write connection string is repointed from us-east-1 to the eu-west-1 writer endpoint via a config flag or a Route 53 CNAME swing. Applications using write-forwarding from the eu-west-1 cluster endpoints already have writes working, the cluster they were forwarding to is now local.
T+~20 minutes, recovered. RPO: the handful of unreplicated seconds at T+0. RTO: typically ten to twenty minutes wall-clock, inside the one-hour ceiling with an order of magnitude to spare.
Post-recovery. us-east-1 comes back hours later. The old cluster is detached, stale. Rejoining it as a secondary of the new primary reseeds its storage volume from eu-west-1, background operation, no application impact. A later planned switchover-global-cluster returns the team to their “normal” topology at zero RPO on a Tuesday morning when nothing is on fire.
What’s worth remembering
- Aurora Global Database is the purpose-built answer for sub-minute RPO and minutes RTO managed cross-region failover on Aurora, typical lag below 1 second, RPO ~1 second, RTO minutes, up to 10 secondary regions.
- Replication is at the storage layer, not the engine layer. That’s why the numbers hold at sustained write rates in the thousands per second without stressing the primary.
- Switchover and failover are different APIs.
switchover-global-clusteris the planned, zero-RPO operation;failover-global-cluster --allow-data-lossis the unplanned one. - Write-forwarding is per-session, opt-in, with consistency modes.
SESSIONdefault when enabled,EVENTUALtrades consistency for latency,GLOBALtrades latency for consistency. - Secondary clusters are readable by default, each hosts reader instances serving read-only traffic with sub-second freshness.
- Cross-region read replicas are an older, asynchronous, engine-level mechanism with variable lag and manual promotion. Too loose on RPO and self-managed on promotion for this shape.
- DMS is the correct answer for heterogeneous replication. Homogeneous Aurora-to-Aurora with managed failover isn’t its job.
- AWS Backup cross-region copy solves the ransomware requirement with hourly-to-daily RPOs, immutable vaults, and cross-account isolation. Deploy it alongside, not instead.
- The global cluster ARN is what makes operations managed, switchover and failover operate against one identifier as a single orchestrated operation.
- Rejoining the old primary after failover reseeds its storage volume from the new primary as a background operation, no application impact.