When Aurora Should Scale Itself

September 23, 2026 · 17 min read

Solutions Architect · SAA-C03 · part of The Exam Room

The situation

Three Aurora PostgreSQL 16 clusters, one AWS account, three very different usage patterns.

  • Tenant database, multi-tenant OLTP backing the SaaS product. Weekday load is flat at about 4 vCPU / 16 GiB. Saturday 08:00-14:00 it spikes to 14 vCPU / 48 GiB as customers pull monthly reports. Sunday it’s back to flat. Ninety-five percent of the week looks like the flat baseline.
  • Reporting replica, a read replica used by the BI team from 09:00 to 11:00 on weekdays. The rest of the day it sits idle. Queries are heavy; during the two-hour window it consistently pins whatever instance it’s on.
  • Dev cluster, the team’s shared non-production cluster. Used roughly 08:00-17:00 Monday to Friday, off the rest of the time. Load is erratic but peaks well below production.

All three today run on db.r6g.2xlarge provisioned (8 vCPU, 64 GiB). The tenant DB is right-sized for the weekend peak, which means it’s five times too big on a Tuesday afternoon. The reporting replica is five times too big 22 hours out of 24. The dev cluster runs 168 hours a week and is used for maybe 45 of them. Finance is raising the question politely; ops is asking whether Serverless v2 is now a real option.

What actually matters

Before reaching for the Aurora pricing calculator it’s worth naming the trade.

Provisioned Aurora gives a fixed instance class – db.r6g.2xlarge, db.r7g.4xlarge, and so on, that runs continuously at that size. Billed per hour on the instance class; the capacity is whatever the class brings, regardless of load. The operational story is straightforward: pick the class, scale vertically by resizing (minutes of downtime for the primary, near-zero if failing over to a replica first), scale horizontally by adding reader instances. Reserved Instance pricing gets you a substantial discount when the class is stable.

Aurora Serverless v2 gives a fluid instance that scales between a minimum and maximum Aurora Capacity Unit (ACU). An ACU is roughly 2 GiB of memory and proportional CPU; you set MinCapacity (today, anywhere from 0 to some ceiling) and MaxCapacity (today up to 256 ACU), and Aurora slides the instance’s capacity between those bounds in near-real-time based on load. Billed per ACU-second. No instance class exists; the cluster has a writer and optional reader endpoints, but the thing behind the endpoint is a capacity-elastic runtime.

The core tension is predictability of load. If the load is flat or shaped in a way that matches an instance class neatly, provisioned is cheaper because the per-hour rate is lower than the equivalent average ACU-seconds. If the load varies by 3x or more across the week, or is bursty, or idles for long stretches. Serverless v2 starts to win because you only pay for the load you use.

Second: the cost of being wrong about the peak. Provisioned forces a choice up front, and the choice is usually conservative, “size for the peak plus a buffer”. The buffer runs 168 hours a week. Serverless v2 scales into the peak dynamically; if the peak is rare and short, the cost difference is the cost of running the buffer on provisioned.

Third: scaling response time. Provisioned autoscaling (adding readers when utilisation is high) has a minutes-scale reaction time: detect, launch, warm, add to the reader endpoint. Serverless v2 scales the same instance up and down on a seconds-to-sub-minute timescale. If the workload has sharp transitions, like the tenant DB’s 4-to-14 vCPU jump. Serverless v2 absorbs the transition without a new-instance cold start.

Fourth: scale-to-zero. Aurora Serverless v2 now supports scale-to-zero: when nothing connects for a configurable inactivity window (5 minutes or more), the cluster pauses and billing drops to zero ACU-seconds (storage still bills). Waking from zero takes on the order of 10-15 seconds for the first query after idle. Provisioned doesn’t pause short of stopping the cluster, and a stopped cluster auto-starts after 7 days, pause is not continuous cost control.

Fifth: minimum floor. Serverless v2’s non-zero minimum matters. If MinCapacity = 0.5, the instance runs at 0.5 ACU (1 GiB, tiny CPU) when idle, billed continuously; if MinCapacity = 0, idle periods bill nothing, but the first connection after idle pays a cold-start. The choice between “always warm at 0.5” and “cold-start-on-first-query at 0” is a latency-vs-cost call.

Sixth: the cost crossover. A rough heuristic: if the workload’s average utilisation is greater than about 60-70% of the provisioned class’s capacity across the billing period, provisioned is cheaper. Below that, Serverless v2 is cheaper. The exact crossover depends on regional pricing and Reserved Instance commitments, but the 60-70% threshold is a useful first-approximation across most workloads.

What we’ll filter on

  1. Load shape, flat, bursty, or idle-heavy?
  2. Utilisation across the billing period, average ACU equivalent vs provisioned class capacity?
  3. Scaling response time required, minutes, seconds, or no scaling at all?
  4. Scale-to-zero viability, can the workload tolerate 10-15s cold starts after idle?
  5. Reserved Instance eligibility, is the instance class stable enough to commit?
  6. Operational complexity, does the team want to tune instance class and replicas, or delegate it?

The Aurora capacity landscape

  1. Aurora Provisioned, on-demand. Classic instance-class pricing. Pick db.r6g, db.r7g, db.r8g, db.r6i etc.; pay per hour. Scale vertically by modifying instance class (brief failover if the primary is resized). Scale horizontally via reader instances and the Aurora Reader endpoint; target-tracking autoscaling on CPU or connections adds/removes readers between configurable bounds. Reserved Instances for 1 or 3 years cut the hourly rate up to ~65% for 3-year all-upfront.

  2. Aurora Serverless v2. Capacity in ACUs, slides between Min and Max continuously. MinCapacity = 0 enables scale-to-zero pause; otherwise capacity floor holds. Writer and readers are all Serverless v2; can be mixed with provisioned readers. Scaling is vertical within the instance; horizontal scaling still uses reader instances but those readers are also Serverless v2. Cost model: per ACU-second. No RI option; the pricing is on-demand only today.

  3. Aurora Serverless v1 (legacy). The original Serverless. Paused during inactivity, scaled in coarse steps, had quirks with concurrent-connection migration. Still available but on the way out; new designs go to v2. Mentioning it here because some older architectures run v1 and the migration path to v2 is worth knowing.

  4. Mixed mode. Writer on provisioned, readers on Serverless v2 (or vice versa). Legal, useful when the writer’s load is stable and predictable but the reader pool has bursty analytical workloads that benefit from elastic scaling.

  5. Aurora Global Database. Not a capacity mode per se, but worth naming: Global Database replicates an Aurora cluster across Regions with sub-second replication, and secondary Regions can be provisioned or Serverless v2. Changes the capacity conversation for multi-Region workloads.

Side by side

Mode Scaling Scale-to-zero RI eligible Cost model Best load shape
Provisioned on-demand vertical (resize) + horizontal (readers) per instance-hour Flat, predictable
Provisioned + RI same discounted instance-hour Flat, stable, long-lived
Serverless v2, Min > 0 continuous within Min-Max per ACU-second Bursty, variable
Serverless v2, Min = 0 continuous, pause when idle per ACU-second (0 when paused) Intermittent, long idles
Serverless v1 coarse steps + pause per ACU-second Legacy only

Reading the table by workload:

  • Tenant DB, flat weekdays, sharp weekend peaks. Cost-model-wise: if average utilisation across the week is below ~65%, Serverless v2 Min > 0 wins; if it’s above, Provisioned + RI wins. Given the described shape, flat at 4 vCPU five days, peak at 14 vCPU on Saturday mornings only, average is well under 50% of an 8-vCPU provisioned class, so Serverless v2 is the call.
  • Reporting replica, idle 22h/day, pinned 2h/day. Scale-to-zero plus generous Max is the shape; the 2 hours of pinned work dominate the cost, but the 22 hours of zero-billing dominate the savings. Serverless v2 Min = 0 with Max sized to the query load.
  • Dev cluster, on for 9 hours, off for 15 hours, off all weekend. Scale-to-zero plus a scheduled stop at 20:00 as belts-and-braces. Serverless v2 Min = 0 wins handily against any provisioned option that runs 168 hours a week.

The load-shape to mode mapping

Tenant DB, weekdays flat, Saturday-morning spike Serverless v2, Min 2 ACU, Max 16 ACU Mon Tue Wed Thu Fri Sat Sun Min 2 16 ACU 4 0 Serverless v2 Min 2, Max 16 handles 3.5x peak/trough without pre-sizing Reporting replica, two-hour weekday window, otherwise zero Serverless v2, Min 0 (scale-to-zero), Max 8 ACU Mon Tue Wed Thu Fri Sat Sun 8 ACU 0 Serverless v2 · Min = 0 pause when idle ~10 h billed per week vs 168 h provisioned Dev cluster, weekday office hours, off evenings and weekends Serverless v2, Min 0 (scale-to-zero), Max 4 ACU Mon Tue Wed Thu Fri Sat Sun 4 ACU 0 Serverless v2 · Min = 0 + scheduled stop at 20:00 ~45 h billed per week belts + braces
Three load shapes, three Serverless v2 configurations. The tenant DB keeps a warm floor; the reporting replica and dev cluster both scale to zero when unused.

The picks in depth

Tenant DB → Serverless v2, Min 2 ACU, Max 16 ACU. The flat weekday load of ~4 ACU sits well above the Min 2 floor, so steady-state billing is around 4 ACU-hours per hour. The Saturday spike scales the writer up within seconds as connection count and CPU climb; the scale-up is smooth enough that the application sees latency creep briefly but not connection drops. Max 16 caps runaway scaling in case a bad query or a misconfigured batch job tries to consume the whole cluster. Storage scales independently; for a 400 GiB database the storage bill is $0.10/GiB-month regardless of capacity mode.

Estimated cost: ~4 ACU × 100 h + ~8 ACU × 10 h (Saturday) × $0.12/ACU-hour = ~$57/week. The equivalent db.r6g.2xlarge at on-demand is ~$0.58/hour × 168 = ~$97/week. Saving ~40%, and the writer handles the Saturday peak without autoscaling-by-resize or reader proliferation.

Reporting replica → Serverless v2, Min 0, Max 8 ACU. The cluster runs as a standalone Serverless v2 cluster (or an Aurora reader in the tenant cluster, same shape). MinCapacity = 0 means the moment the BI team disconnects, the pause timer starts; default 5 minutes of inactivity triggers the pause. The next query after a pause waits ~12 seconds while the cluster wakes; the BI team’s first query of the morning is always a bit slow, which is acceptable.

Estimated cost: 10 hours × 6 ACU × $0.12 = ~$7.20/week. A db.r6g.2xlarge running continuously for this same workload is ~$97/week. Saving ~92%.

Dev cluster → Serverless v2, Min 0, Max 4 ACU, plus EventBridge schedule. Min 0 provides the scale-to-zero benefit; an EventBridge Scheduler rule at 20:00 on weekdays calls rds:StopDBCluster as a belt against anybody leaving the cluster busy overnight. Serverless v2 auto-pause is faster and cheaper than a full cluster stop, but the scheduled stop guarantees zero cost on the dev cluster at night regardless of long-running queries someone left open in a disconnected IDE.

Estimated cost: 45 hours × 2 ACU × $0.12 = ~$11/week. Provisioned at same instance class: ~$97. Saving ~89%.

When provisioned wins

The three workloads above all favour Serverless v2, but the question deserves an honest answer in the other direction. Provisioned wins when:

  • Load is genuinely flat and predictable. A production OLTP cluster running at 70-80% of a db.r6g.2xlarge all week, every week, is cheaper on provisioned + 3-year RI than on Serverless v2 at the equivalent average ACU. The RI discount (~65%) exceeds the elasticity savings for a workload that isn’t elastic.
  • Peak capacity exceeds Serverless v2 Max. Serverless v2 caps at 256 ACU (roughly db.r6g.16xlarge-equivalent). Workloads larger than that need provisioned.
  • Instance-level features are required. Some Aurora features (enhanced monitoring granularities, specific engine parameters, certain extension configurations) interact differently with Serverless v2; check the compatibility matrix for the PostgreSQL or MySQL version in use.
  • Cost predictability dominates. Per ACU-second billing is cheaper on average but less predictable month-to-month. If finance prefers a flat monthly line, provisioned + RI gives it.

A common compromise: writer provisioned (predictable), readers Serverless v2 (elastic). The writer’s base load is stable; reader load is bursty (analytics, ad-hoc queries), and Serverless v2 readers absorb the bursts without running 168 h/week.

A worked cost trace: the tenant DB over one week

Mon 00:00–Sat 07:59  (128 hours at ~4 ACU steady)
  128 × 4 × $0.12                                    =  $61.44

Sat 08:00–Sat 13:59  (6 hours ramping 4→14→4)
  avg ~10 ACU × 6 × $0.12                            =   $7.20

Sat 14:00–Sun 23:59  (34 hours at ~4 ACU)
  34 × 4 × $0.12                                     =  $16.32

Serverless v2 compute total                           =  $84.96
Storage (400 GiB × $0.10/month / 4.35 weeks)          =  $9.20
Total (Serverless v2)                                 =  $94.16

Equivalent provisioned db.r6g.2xlarge, on-demand:
  168 h × $0.58                                      =  $97.44
Equivalent provisioned, 1yr RI no-upfront:
  168 h × $0.37                                      =  $62.16
Equivalent provisioned, 3yr RI all-upfront:
  168 h × $0.22                                      =  $36.96

The interesting line: Serverless v2 beats provisioned on-demand by ~$3/week but loses to 3-year all-upfront RI by ~$47/week. On a three-year horizon where the workload’s shape is confident, the RI wins. On a one-year horizon, or where the workload shape is still being validated, Serverless v2’s flexibility is worth the ~$20/week versus 1-year RI.

The decision isn’t “which is cheaper” in isolation; it’s “how confident are we about the load shape three years from now?”. Serverless v2 is a calculated hedge; a 3-year RI is a confident bet.

What’s worth remembering

  1. Serverless v2 is sized in ACUs, not instance classes. An ACU is roughly 2 GiB and proportional CPU; the cluster scales between Min and Max continuously, billed per ACU-second.
  2. Min capacity is a latency-vs-cost knob. Min > 0 keeps the cluster warm; Min = 0 enables pause and a ~10-15s cold start on first query after idle.
  3. Provisioned is cheaper than Serverless v2 above ~65% average utilisation. Below that threshold, Serverless v2’s elasticity beats the continuous cost of an oversized class. The threshold moves with RI pricing.
  4. Reserved Instances only apply to provisioned. Serverless v2 has no RI equivalent today. If the load shape is stable enough to commit, provisioned + RI is the deepest discount.
  5. Mixed-mode clusters are legal and often optimal. Writer provisioned, readers Serverless v2, is a good shape for workloads with stable writes and bursty reads.
  6. Storage bills separately and identically. Aurora storage is $0.10/GiB-month across both modes, with I/O charges layered on top depending on the cluster’s storage configuration (standard vs I/O-Optimized).
  7. Scheduled stop is a belt over Serverless v2 pause. rds:StopDBCluster on a schedule guarantees zero cost on non-production clusters even if pause doesn’t trigger (open connections, long-running queries).
  8. Horizontal scaling still uses reader instances. Serverless v2 scales a single writer vertically within its Min-Max range; beyond a single-writer workload, readers (Serverless v2 or provisioned) handle the horizontal story.

Three load shapes, three capacity configurations, and all three happen to be Serverless v2 in this portfolio, because every one of them has a wide variance between peak and trough or an idle tail that provisioned can’t match on cost. For the hypothetical fourth workload that runs flat and heavy 168 hours a week, the answer would flip.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.