Designing RDS for AZ Failover and Cross-Region Recovery

September 07, 2026 · 17 min read

The situation

A SaaS company runs a production database on Amazon RDS for PostgreSQL (not Aurora) in us-east-1. Today it’s a single-AZ deployment, one DB instance, one AZ, automated backups on. The business has landed two requirements: protection against AZ-level events with sub-minute RTO, and protection against region-level events with RPO ≤ 5 minutes and RTO ≤ 1 hour via recovery in us-west-2.

Constraints on the solution. Operating cost should stay reasonable, doubling spend is acceptable, tripling it for marginal benefit is not. Operational burden should stay low, no bespoke replication plumbing, no manual AZ-failover drills, though a manual regional promotion is tolerated because region failovers are rare and always human-driven anyway. The PostgreSQL engine stays as-is; no migration to Aurora.

What actually matters

Two failure scopes, two time budgets, and one relational database that has to be protected against both. Before reaching for a feature, it’s worth pulling the problem apart.

The first consideration is what “AZ-level protection” buys and what it doesn’t. An in-region high-availability shape protects against one AZ going dark while the rest of the Region is fine. It does not protect against a Region-wide event, a bad write, a DROP TABLE, or a corruption that gets replicated synchronously to a standby before anyone notices. It’s the correct tool for AZ power loss, network partitions inside the Region, hardware failures on the primary, not for the other categories. A team that conflates AZ-level HA with “disaster recovery” will underinvest in backups and overinvest in standbys.

The second is synchronous vs asynchronous replication and what that choice costs. Synchronous replication means commits wait for the standby to ack before returning. RPO is zero, but write latency includes the cross-AZ hop, and a slow standby can stall the primary. Semi-synchronous relaxes this: the commit waits for at least one of two standbys to ack, so a slow node doesn’t block every commit. Asynchronous means commits return immediately and the replica catches up on its own time. RPO is measured in seconds under load, write latency isn’t affected, but some transactions are lost if the primary vanishes. For in-region HA the near-zero RPO matters; for cross-region DR the latency of synchronous replication across a continent is untenable, so async is the only honest answer.

The third is how failover actually happens. Managed-database failover is usually a DNS flip, existing connections drop, clients reconnect to the same endpoint, the DNS answer now resolves to a different instance. Sub-minute RTO means picking a shape whose internal machinery can promote and flip DNS in under a minute. Some shapes target around a minute; others, with multiple readable standbys to pick the most current promotion candidate from, target meaningfully less. For a sub-minute budget, “around a minute” is on the edge and “well under a minute” has headroom.

The fourth is who promotes on a regional failover, and when. Cross-region replication for traditional relational engines typically isn’t auto-promoted on source loss, promotion is a deliberate human-driven action. That’s a feature, not a limitation, in a world where regional events are often partial, recoverable, or misdiagnosed, auto-promoting could create a split-brain if the source Region comes back and both databases think they’re primary. Designing for a manual promotion gate means designing for a human to make the call, and for the RTO budget to include the time it takes to make that call.

The fifth is the cost shape of the composed answer. Multiple instances in the primary Region plus at least one in the recovery Region is several instance-sized bills where there was previously one. Whether standbys are readable or dark capacity matters: readable standbys can absorb existing read-replica workloads and partially offset the gross cost. The recovery-Region instance, if readable, can serve regional traffic in steady state too.

The sixth, easy to miss, is what the obvious-looking non-candidates actually do. Cross-region backup copy is backup, not replication; RPO equals backup interval, which is too coarse for a five-minute target. General-purpose change-data-capture pipelines are built for heterogeneous replication, different engine, different target, different shape, and using them for same-engine same-shape replication means building by hand what the managed product ships as a feature. Both are the correct tools for other problems; both are overbuilt or underspecced for this one.

And finally, the elephant in the room: a different engine choice would change the shape of the answer entirely. Some engines ship multi-AZ storage by default and offer storage-layer cross-region replication with sub-second RPO and managed switchover. The scenario stipulates RDS PostgreSQL, which means those answers are unavailable. It’s worth being explicit about that, because “just switch engine” is the pattern people reach for even when they can’t.

What we’ll filter on

Distilling the exploration into filters:

Sub-minute AZ RTO, anything over a minute fails the in-region requirement.
Cross-region RPO ≤ 5 minutes, replication lag between us-east-1 and us-west-2 has to stay under five minutes under actual write load.
Cross-region RTO ≤ 1 hour, from “declare failover” to “us-west-2 accepting writes” in under sixty minutes.
Managed and reasonable, no DMS pipelines to own, no homegrown streaming replication, no snapshot-restore recovery path.

The RDS HA and DR landscape

Five mechanisms touch this problem.

Multi-AZ DB instance (single standby). The original Multi-AZ shape. One primary plus one synchronous standby in a different AZ, same Region. Storage kept in lockstep by block-level synchronous replication. The standby does not serve reads, dark capacity held for failover. On failure, RDS flips the CNAME to the standby and promotes it. AWS cites “as quickly as 60 seconds”; in practice 60-120 seconds. Supported on all RDS engines. About 2x single-AZ cost.
Multi-AZ DB cluster (two readable standbys). The newer three-AZ shape. One writer plus two readable standbys, one per AZ. Semi-synchronous replication, commit returns once at least one standby acks. Both standbys are full RDS instances serving reads via the reader endpoint. Failover is typically under 35 seconds. Supported on MySQL 8.0.28+ and PostgreSQL 13.4+ with NVMe-local-SSD classes. About 2.5x single-AZ cost, partially offset if a dedicated read replica can retire into the cluster’s readers.
Cross-region read replica. An asynchronous replica of the primary in a different AWS Region. Engine-level replication (PostgreSQL streaming WAL over AWS’s backbone), not storage-level. Full RDS instance, readable and promotable via promote-read-replica. Promotion is one-way and manual for PostgreSQL; once promoted, the replica is a standalone writable database. Replication lag for a moderate-write SaaS workload between us-east-1 and us-west-2 typically sits in the single-digit seconds range. Up to five cross-region replicas per source.
AWS Backup cross-region copy. Scheduled snapshot backups copied to a vault in the destination Region. RPO equals backup interval, the five-minute target is out of reach. Restore times for production-sized databases land in tens of minutes to hours. Perfect for compliance archival and ransomware resilience; wrong tool for live DR of this shape.
AWS DMS with ongoing CDC. A general-purpose continuous-replication service. Excels when source and target are different. Aurora to Redshift, SQL Server to PostgreSQL, PostgreSQL to S3. For same-engine, managed RDS PostgreSQL to RDS PostgreSQL across Regions, DMS is building by hand what cross-region read replicas ship as a product feature, plus replication instances, tasks, schema-change coordination, and lag monitoring.

Side by side

Option	Sub-minute AZ RTO	Cross-region RPO ≤ 5 min	Cross-region RTO ≤ 1 h	Managed and reasonable
Multi-AZ DB instance (1 standby)	✓	✗	✗	✓
Multi-AZ DB cluster (2 readable standbys)	✓	✗	✗	✓
Cross-region read replica	✗	✓	✓	✓
AWS Backup cross-region copy	✗	✗	,	✓
DMS with ongoing CDC	✗	✓	✓	✗
DB cluster + cross-region replica	✓	✓	✓	✓

Nothing wins alone, stack Multi-AZ DB cluster with a cross-region read replica and all four attributes are covered with two managed features and no bespoke plumbing.

Matching scopes to mechanisms

Two failure scopes, two answers, composed onto one database. The in-region scope takes an automated semi-synchronous cluster; the region-level scope takes an asynchronous replica promoted by a human.

Multi-AZ DB cluster, in depth

The cluster shape earns its place by being fast enough to meet the sub-minute attribute with headroom and by paying back part of its cost through readable standbys.

Three AZs, one writer, two readers. Each AZ gets one instance. The writer accepts all connections via the cluster endpoint; the two standbys accept read connections via the reader endpoint, round-robin at DNS.

Semi-synchronous replication. On commit, the writer sends the change to both standbys and waits for at least one to ack before returning. Either reader ack satisfies the wait, a slow standby doesn’t stall every commit the way it does on the single-standby shape. Under normal load, write latency is meaningfully lower.

Failover picks the most current reader. When the writer fails, RDS evaluates the two standbys and promotes whichever has the most recent change record. For PostgreSQL, promotion waits for the chosen reader’s lag to drain, the other reader doesn’t block. DNS flips, the surviving reader rejoins as a standby of the new writer. AWS cites typically under 35 seconds for the full cycle.

Instance classes and engines. DB clusters require NVMe-local-SSD classes, db.m5d, db.m6gd, db.m6id, db.r5d, db.r6gd, db.r6id, db.x2iedn and similar. Supported on MySQL 8.0.28+ and PostgreSQL 13.4+. Non-Aurora, non-Oracle, non-SQL-Server.

Cost. Three instances vs one, with storage provisioned once at cluster level but replicated three times under the hood. If the team was running a separate read replica for analytics today, the cluster’s readers absorb that load and the dedicated replica retires, partially offsetting the move.

The cross-region read replica, in depth

For the regional DR case a separate async mechanism does the work, and it’s the oldest, most understood tool in the RDS toolbox.

Asynchronous streaming over AWS’s backbone. PostgreSQL WAL records from the us-east-1 writer stream to the us-west-2 replica; the replica replays them in order and exposes a standard read-only PostgreSQL connection. Lag depends on write volume, network RTT (~60-70 ms between us-east-1 and us-west-2), and replica instance class. For a moderate SaaS workload, sub-5-second lag is typical.

Readable during steady state. The replica is a full read-only replica. West-coast applications or analytics jobs can point at it. Not dark capacity.

Promotion is a manual API call. When the team declares regional failover, one call detaches the replica from its source and converts it to a standalone writable database.

aws rds promote-read-replica \
  --region us-west-2 \
  --db-instance-identifier greenbox-usw2-replica \
  --backup-retention-period 7

Promotion is one-way. If us-east-1 recovers, the old primary holds stale data. Reconnecting them means a new replication build, not a resume.

Application cutover and in-region HA. The application’s writer endpoint moves from the us-east-1 cluster endpoint to the us-west-2 replica endpoint, a connection-pool config flag or a Route 53 CNAME swung on failover. The replica is single-AZ by default; after promotion, modify-db-instance --multi-az (or a move to Multi-AZ DB cluster) restores AZ-level HA on the new primary. Managed operation, not hand-rolled.

A worked example: two failover traces

Two separate failures. Two very different responses.

Scenario A: one AZ in us-east-1 goes dark. 14:02 UTC. us-east-1a loses power. The cluster’s writer was there.

T+0 to T+10s. Writer stops responding. RDS health checks detect.
T+10s to T+30s. RDS evaluates the two standbys and promotes the one with the most recent change record. Cluster endpoint DNS updates.
T+30s to T+60s. Application connection pools refresh. New transactions land on the new writer; reads continue via the reader endpoint on the surviving standby.
T+a few minutes. RDS provisions a replacement reader in us-east-1a (or an alternative AZ) and re-establishes semi-sync.

Nothing manual. On-call engineer sees the alert, watches it recover, closes the ticket. RPO ~0, RTO typically under 35 seconds.

Scenario B: us-east-1 itself becomes unreachable. 14:02 UTC. A regional event, control-plane outage, cable cut, correlated AZ failure.

T+0 to T+15m. Incident commander confirms us-east-1 isn’t recovering. AWS Health Dashboard confirms the regional event. Team declares failover.
T+15m. An engineer runs promote-read-replica. Lag at the moment of failure was 3 seconds, those writes are lost. RPO: 3 seconds, inside the 5-minute target.
T+15m to T+20m. Promotion completes. The us-west-2 replica is now a standalone writable database.
T+20m to T+30m. Application config updates the writer connection string to the us-west-2 endpoint.
T+30m to T+40m. modify-db-instance --multi-az on the new primary restores in-region HA.

RTO: about 30-40 minutes wall-clock, comfortably inside the 1-hour target. The only manual steps are the promotion command and the config change.

When us-east-1 recovers days later, reverting is unhurried: provision a fresh cluster in us-east-1, build it as a cross-region replica of the current us-west-2 primary, wait for catch-up, promote us-east-1, rebuild us-west-2 as a replica. Scheduled for a quiet Wednesday morning.

Why this isn’t Aurora

Aurora has a very different architecture for the same problem, and the distinction is a common source of confusion.

Aurora’s storage is always multi-AZ. Even a minimal Aurora cluster with one DB instance sits on a cluster volume automatically replicated across three AZs. There is no “Multi-AZ option” to turn on for Aurora’s storage; multi-AZ storage is the default. Aurora’s options are about additional DB instances (readers in other AZs) so that compute is highly available too.

Aurora’s cross-region story is Aurora Global Database. A separate product: storage-layer cross-region replication, typical sub-second lag, managed switchover and failover APIs, up to ten secondary regions. Purpose-built for sub-minute RPO on cross-region DR.

Neither applies to RDS PostgreSQL. The storage layer is not magically multi-AZ; single-AZ means single-AZ until Multi-AZ is provisioned. There is no Global Database for non-Aurora RDS; cross-region replication means engine-level read replicas, async, manually promotable, one-way. The scenario stipulates RDS PostgreSQL; the Aurora answers are unavailable.

Why not Multi-AZ DB instance instead

The single-standby Multi-AZ shape is the obvious alternative. It’s supported on all RDS engines and on any instance class. For this scenario, DB cluster wins on three axes: faster failover (under 35 s vs 60-120 s), readable standbys (offsets cost), and three AZs not two (a correlated two-AZ event leaves one surviving reader to promote).

DB instance wins where the engine isn’t supported on DB cluster (Oracle, SQL Server, MariaDB, MySQL before 8.0.28) or where the NVMe-local-SSD class set doesn’t fit. Neither constraint applies here.

What’s worth remembering

Two Multi-AZ shapes exist. DB instance: 1 primary + 1 synchronous standby, DNS failover in 60-120 s, standby serves nothing. DB cluster: 1 writer + 2 readable standbys across 3 AZs, semi-sync, failover typically under 35 s, MySQL 8.0.28+ / PostgreSQL 13.4+ on NVMe-local-SSD classes.
Cross-region read replicas are async and manually promotable. PostgreSQL cross-region replicas are not auto-promoted on source loss – promote-read-replica is always a human-driven action.
Same-region Multi-AZ and cross-region replication are different problems with different tools. Multi-AZ is synchronous or semi-sync, automatic, in-region. Cross-region is asynchronous, manual promotion, across regions. Neither substitutes for the other.
AWS Backup cross-region copy is backup, not replication. RPO equals backup frequency; sub-5-minute RPO is out of reach.
DMS is for heterogeneous replication. Same-engine RDS-to-RDS is the job of cross-region read replicas. DMS adds operational burden for no gain in the homogeneous case.
Aurora’s storage is always multi-AZ; RDS PostgreSQL’s isn’t. Aurora replicates the storage volume across three AZs by default and offers Global Database for cross-region. RDS PostgreSQL is a conventional single-node database by default.
Failover numbers to memorise. Multi-AZ DB instance: ~60-120 seconds, DNS-based. Multi-AZ DB cluster: typically under 35 seconds. Cross-region replica promotion: minutes of promotion, tens of minutes of total RTO.
Readable standbys offset cost. A DB cluster’s reader endpoint can retire a dedicated read replica that existed for analytics, softening the move from 1x to 2.5x spend.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.