Composing Savings Plans Across a Portfolio

The situation

The portfolio, as it stands:

Core services account (stable, production). Five services on EC2 and Fargate, ~300 instance-equivalents of compute, 24/7, stable shape for 18+ months. About $420,000/year on-demand equivalent.
Data platform account (bursty analytics). Eight Redshift and EMR workloads, five of which run on demand and three continuously. ~150 instance-equivalents of compute on EMR during business hours, negligible overnight. Roughly $180,000/year on-demand.
Dev and test account. Ephemeral fleets, varying week to week; teams spin up environments on demand. ~40 instance-equivalents averaged. About $60,000/year.
CI and batch account. Ten thousand CI jobs a week running in ECS on Fargate Spot today; a nightly batch pipeline running on EC2 Spot with on-demand fallback. ~80 instance-equivalents averaged. About $90,000/year.

Two Regions: eu-west-1 is the primary; us-east-1 carries a small latency-sensitive tier for North American customers (~5% of total). Over three years, the forecast says core services grow 20%, the data platform doubles as the product team ships more analytics features, dev stays flat, and CI stays flat or slightly grows.

The question isn’t “which pricing model wins”. The question is what portfolio shape of commitments covers the baseline at the best discount while leaving room for the growth and the inevitable surprises.

What actually matters

The foundational pricing post (“Steady, Bursty, and Throwaway”) answered the question for a single workload: match the rhythm to the pricing model. This post is that question at the portfolio level, where the arithmetic changes in subtle ways.

The first thing that changes is what “baseline” means across a portfolio. A single workload’s baseline is its minimum instance-hours per hour. A portfolio’s baseline is the minimum total instance-hours per hour across all workloads, which is typically higher than the sum of individual minimums because workloads don’t all dip at the same time. Savings Plans cover aggregate spend rather than specific workloads, so a Compute Savings Plan sized to the portfolio’s baseline spend captures the portfolio’s constant background hum without locking any particular workload into a shape.

Second: the RI Marketplace as a liquidity option. Standard Reserved Instances can be listed for sale on the RI Marketplace when they no longer match the workload. The marketplace isn’t deeply liquid and prices tend to be below remaining value, but it’s a real exit from a committment that’s gone wrong. Savings Plans can’t be resold; a Savings Plan that stops being useful runs to the end of its term. This asymmetry matters for committing across a portfolio: Standard RIs on highly-likely-to-persist workloads (the core database tier) are fine because they almost certainly won’t need an exit, and they give up nothing the portfolio cares about.

Third: convertibility as insurance. Convertible RIs trade some discount depth for the ability to exchange reservations for different shapes. For workloads where the amount of compute is confident but the shape (instance family, generation) is not, Convertible RIs hold their value across hardware refreshes. The trade is roughly 6 percentage points of discount for the ability to migrate from m6i to m7i to m8i without losing the reservation.

Fourth: Spot diversification at portfolio scale. A single workload on Spot diversifies across instance families and AZs to reduce interruption risk. At portfolio scale, the question is which workloads are candidates for Spot, and the rule is simple: every workload that checkpoints and retries should run on Spot unless a specific reason says otherwise. The CI fleet is an obvious candidate; the EMR analytics jobs are less obvious but most of them checkpoint cleanly. Spot Fleet with a mix of instance types and allocation strategies (price-capacity-optimized, capacity-optimized) keeps eviction rates low while capturing the discount.

Fifth: how the layers relate to each other. A practical portfolio commitment is a stack rather than a single instrument. At the bottom sits the deepest-discount, no-commitment option for anything that tolerates interruption. Above that, a broad commitment covers the aggregate steady baseline at a strong discount, with minimal lock-in on workload shape. A deeper but narrower commitment can sit on top of that for the most stable subset, where the instance shape isn’t expected to move. A flexibility-preserving commitment slots in for workloads where the amount of compute is confident but the hardware generation isn’t. And on-demand handles the residual: dev, peaks, pilot work. Each layer trades discount for flexibility differently, and the portfolio’s overall savings comes from how the layers stack, not from any single one.

Sixth: which container and serverless shapes a given commitment covers. The broadest commitment types cover not just EC2 but Fargate and Lambda too, which matters when a portfolio is gradually containerising or moving to event-driven compute. Narrower commitments are EC2-only and can be stranded by a mid-term migration. The trade between broader coverage and deeper discount is real.

Seventh: the concept of “commitment coverage”. The provider’s cost-explorer tooling reports the fraction of on-demand-equivalent usage that’s been discounted by a commitment. A sensible portfolio target sits well below 100%, the uncovered fraction is the flex: new workloads, dev/test, peaks above baseline. Chasing 100% coverage is a mistake because the cost of being wrong about the last 20% almost always exceeds the discount gained.

Eighth: cross-account context. Inside an organisation, most commitment instruments can be shared across linked accounts by default. A commitment purchased in one account’s billing can cover matching usage in any other. This is what makes the portfolio-level commitment story coherent: there’s one pool of commitments, not one per account.

What we’ll filter on

Workload stability: same shape for 1 year? 3 years?
Interruption tolerance: can this workload run on Spot?
Instance-family specificity: is the hardware generation fixed, or will it migrate?
Service type: EC2, Fargate, Lambda, or mixed across the term?
Growth projection: flat, growing, shrinking?
Exit options: is there liquidity if the bet goes wrong?

The commitment layering landscape

Spot (ninety percent off, zero commitment). For interruption-tolerant workloads. No commitment in either direction; AWS reclaims capacity with a 2-minute warning when someone else buys it. Diversify across instance families and AZs to spread eviction risk. Fargate has its own Spot variant for ECS-on-Fargate containers.
Compute Savings Plan (up to ~66% off at 3-year all-upfront). Dollar-per-hour commitment, covers any EC2 family, any Region, any size, plus Fargate and Lambda. The broadest SP. Right for covering the portfolio’s aggregate baseline without locking to a specific shape.
EC2 Instance Savings Plan (up to ~72% off at 3-year all-upfront). Commit to a specific instance family in a specific Region. Six percentage points deeper discount than Compute SP in exchange for losing family and Region flexibility. Right for core workloads that are confidently on a specific family in a specific Region for three years.
Standard Reserved Instance (up to ~72% off at 3-year all-upfront). Rigid: family, size, OS, tenancy, Region all locked. Listable on the RI Marketplace. Discount depth matches EC2 Instance SP; the RI Marketplace gives an exit path that EC2 Instance SPs don’t have, at the cost of marketplace liquidity risk.
Convertible Reserved Instance (up to ~54% off at 3-year all-upfront). Exchangeable for a different family, size, OS, or tenancy. Six-ish percentage points below Standard RI depth in exchange for exchange rights. Useful when the amount of compute is confident but the shape is not.
On-demand. No commitment. Full price. The correct answer for the residual 20% the portfolio doesn’t want to commit.

Side by side

Option	Discount	Lock-in	Shape flexibility	Fargate/Lambda	Exit option
Spot	~90%	none	any (diversify)	Fargate Spot only	end session
Compute SP	up to ~66%	1 or 3 yr	any family, any Region	✓	none
EC2 Instance SP	up to ~72%	1 or 3 yr	size within family; Region locked	✗	none
Standard RI	up to ~72%	1 or 3 yr	locked family/size/Region	✗	RI Marketplace
Convertible RI	up to ~54%	1 or 3 yr	exchangeable	✗	exchange
On-demand	0%	none	total	native	N/A

Reading the table across the portfolio:

Core services (stable, 18 months history, growing to Fargate): Compute Savings Plan covering the aggregate. Slightly deeper discount available via EC2 Instance SP on the most-stable subset, but Compute SP’s coverage of Fargate during the containerisation migration is worth the 6 percentage points.
Data platform (growing, family-stable on EMR): A mix. Continuous portion on Compute SP (follows containerisation). EMR transient portion on Spot.
Dev/test (unpredictable): On-demand. Committing to dev is committing to a guess.
CI + batch (interruption-tolerant): Fargate Spot for CI, EC2 Spot Fleet for batch. Optional small SP underneath as a floor for the hours that can’t land on Spot.

The portfolio layering, visually

The portfolio is layered: Spot at the bottom (90% off for interruptible work), Compute SP as the baseline (66% off, broad coverage), EC2 Instance SP at the top of committed usage (72% off for stable shapes), and on-demand as the flex capacity above the commitments.

The commitment portfolio, in depth

The baseline: Compute Savings Plan at 3-year all-upfront. Sized to 55% of the portfolio’s forecast spend over the next three years. The figure comes from Cost Explorer’s recommendation tool, which looks at the last 30 days of usage and projects what a commitment at various levels would save. The portfolio-aggregate baseline is steadier than any individual workload’s baseline, because dips in one workload are typically filled by usage elsewhere; the effect is that a larger fraction of the bill is commit-able than workload-by-workload analysis would suggest.

Example numbers (monthly):

Portfolio on-demand-equivalent compute spend:     $62,500/month
Sustained baseline across the portfolio:          $34,400/month (55%)

Compute SP 3-yr all-upfront at $34,400/month     →  $34,400 × 0.34 = $11,700/month actual
Savings vs on-demand at baseline:                    $22,700/month

The deep-discount layer: EC2 Instance SP on core services. The five core services are on m6i instances in eu-west-1 and have been for 18 months. The product team expects them to stay on that family for the three-year term (next hardware refresh likely comes before migration to m7i or m8i). An EC2 Instance SP on m6i covers ~10% of portfolio spend at 72% off:

$6,250/month on-demand-equivalent × 0.28 = $1,750/month actual
Savings: $4,500/month

The 6 percentage points of depth versus Compute SP matter here because the target usage is so predictable. If containerisation surprises the team and these services migrate to Fargate mid-term, the EC2 Instance SP becomes stranded; the Compute SP above (which covers another 55% of the portfolio) follows them, so the risk is bounded.

The interruption layer: Spot across CI and batch. Fargate Spot for CI (ECS task-level); EC2 Spot Fleet for batch (EC2-level with price-capacity-optimized allocation and five instance types across three AZs). The CI workloads are already on Spot today; the batch workloads move across, with an on-demand fallback when the Spot market can’t fill.

CI + batch on-demand-equivalent:  $9,375/month
Spot savings (~90%):              $9,375 × 0.10 = $937/month actual
Savings:                          $8,438/month

The flex layer: on-demand. Dev/test, pilot services, peaks above baseline. Target this at 20% of the portfolio; it’s the buffer against being wrong about the baseline size. Every dollar of Compute SP over-commitment is waste if usage drops; every dollar of on-demand in the flex layer is flexibility. Trading a bit of discount for elasticity is a good trade.

A worked portfolio cost

Three-year portfolio cost at current forecast, on-demand throughout:

Year 1:  $750,000
Year 2:  $825,000  (10% growth)
Year 3:  $907,500  (10% growth)
Total:   $2,482,500

With the commitment layering:

Year 1:
  Spot (~15% @ 90% off):             $112,500 × 0.10 = $11,250
  Compute SP (~55% @ 66% off):       $412,500 × 0.34 = $140,250
  EC2 Instance SP (~10% @ 72% off):  $75,000 × 0.28  = $21,000
  On-demand (~20%):                  $150,000        = $150,000
  Year 1 total:                                        $322,500

Year 2 (growth absorbed by the portfolio, same SP commits):
  Broadly similar but with ~$75,000 of growth on on-demand:
  Year 2 total:                                        $397,500

Year 3 (further growth on on-demand):
  Year 3 total:                                        $472,500

Committed portfolio, 3-year total:                   $1,192,500
Savings vs on-demand:                                $1,290,000  (~52%)

The commitment portfolio saves just over half the bill. The SP commitments are big but bounded; growth above the baseline rides on-demand and can be folded into the next SP purchase when it stabilises. The Spot layer handles the bulk of batch work at ninety percent off, which is where the deepest individual discounts come from. And the uncommitted flex layer absorbs the surprises without anyone having to justify why a three-year commitment was wrong.

Commitment portfolio hygiene

Three ongoing practices keep the portfolio healthy after the initial purchases:

Monthly coverage review. Cost Explorer’s Savings Plans Coverage report, Organization-wide. Target 75-85% coverage; below 75% means under-committed (next purchase tranche); above 85% means over-committed (some SPs are covering peaks rather than baseline and will strand as usage grows).
Quarterly Spot fleet health check. Interruption rates per instance family, per AZ, per Region. Diversify families or shift AZs when one specific pool’s interruption rate climbs.
Annual re-baseline. Compare the actual year against the projection. Update the next-year commitment plan; expire non-renewing SPs intentionally rather than by surprise; consider Convertible RI exchanges if hardware generations have shifted.

Commitments are not a one-time decision. The portfolio’s shape evolves with the roadmap, and the commitment plan is a rolling instrument that adjusts with it.

What’s worth remembering

Portfolio baseline is steadier than individual-workload baselines. Aggregate spend doesn’t dip as far as any single workload does, because workloads dip at different times. This makes a larger fraction of the bill commit-able than a workload-by-workload analysis would suggest.
Compute SP is the workhorse layer. Covers any EC2 family, any Region, plus Fargate and Lambda. At 3-year all-upfront and ~66% off, it’s the correct default for the aggregate baseline.
EC2 Instance SP and Standard RIs earn the deepest discounts on stable shapes. 72% off in exchange for family and Region lock. Use for the specific workloads that won’t move during the term.
Convertible RIs are insurance against hardware refreshes. Exchange for a different family, size, or OS during the term. Six percentage points of depth sacrificed for exchange rights.
Spot at the bottom captures the deepest discount. Ninety percent off on anything that checkpoints and retries. Diversify across families and AZs; small SP underneath as a floor is defensive but usually worth it.
Target 75-85% coverage, not 100%. The uncommitted fraction is the flex capacity: growth, dev/test, pilot services, the workloads whose shape is still being figured out. Chasing higher coverage is almost always a false economy.
Organization-wide sharing makes the portfolio real. SPs and RIs purchased in one linked account cover matching usage across the whole Organization. The portfolio is one pool, not many.
Rebalance annually; don’t re-buy blindly. Actual usage diverges from forecast in both directions. Let expiring commitments be a forcing function to re-evaluate rather than renewing on autopilot.

Fifteen workloads, four accounts, two Regions, three years. The commitment portfolio isn’t a single bet; it’s a layered posture: Spot for the interruptible hours, Compute SP for the aggregate baseline, EC2 Instance SP where the shape is locked, Convertible RI as a hardware-refresh hedge, on-demand for the flex. Each layer pays for a different kind of certainty; each layer has a different exit. Composed correctly, the portfolio gets half the bill off while keeping the roadmap unlocked.