The Archive Nobody Reads

Some data exists for compliance, not for use. It sits in a bucket for years, untouched, then someone needs it for an audit and there’s a working day to produce it. Across S3’s eight storage classes, exactly one is built for that pattern. Picking the wrong one is the difference between a $600 annual bill and a $14,000 one for the same fifty terabytes – and there are three traps that catch people even when they pick the correct class.

The situation

A financial services company holds 50 TB of regulatory archive data in S3 – KYC documents, transaction histories, communications retained under FCA / SEC obligations. The retention period is seven years by regulation. The data is accessed twice a year when external auditors request a sample; the company is given at least 24 hours of notice before each retrieval.

Today it all lives on S3 Standard. That’s a bill in the neighbourhood of $13,800 a year for storage alone, on data that is touched for two days out of every 365 and required to exist for the other 363.

The team’s ask is straightforward. Lowest possible per-GB storage cost over seven years; retrieval feasible inside the 24-hour audit window; standard multi-AZ durability (single-AZ classes are non-starters for regulated data); and minimum operational overhead – no per-object monitoring services, no custom tiering logic, set it once and let lifecycle rules do the work.

What actually matters

Before touring the storage-class menu it’s worth looking at what the shape of this workload actually rewards and punishes.

The dominant factor over seven years is $/GB/month. The data lives for eighty-four months and is read on two of them. Every fraction of a cent on storage compounds through the entire retention window, while retrieval-side costs scale with reads – a figure measured in single events per year. Any class whose marginal cost is “storage is cheap, retrieval is expensive” is structurally well-matched; any class that keeps the per-GB number high to guarantee instant access is paying for latency the workload will never exploit.

The second factor is retrieval latency and, more specifically, what “fast enough” means. Twenty-four hours is an eternity in software terms. Most S3 classes can return an object in milliseconds; paying the premium for that speed is only sensible if the workload exercises it. Here it doesn’t, so we can move all the way down the cost curve until we hit a class whose retrieval SLA threatens the 24-hour window.

The third factor is durability and availability. Durability – the probability that AWS loses our data – is eleven nines on every S3 class including single-AZ ones. Availability is where single-AZ classes are dangerous: one AZ outage during an audit is not a story that plays well with a regulator. The scenario explicitly rules out single-AZ, and it’s the correct call for regulated data even when it’s allowed.

The fourth factor is operational model. S3 has exactly one class that watches objects and re-tiers them automatically (Intelligent-Tiering), and its watchdog fee is per-object. For a predictable, write-once, read-on-schedule archive, paying a service to discover something we already know is a negative-value feature. The class should be chosen explicitly, not discovered by telemetry.

The fifth factor is the bill-shock traps that don’t show up on the pricing page. Glacier classes charge 128 KB per object as a minimum chargeable size (plus ~40 KB of metadata stored at S3 Standard rates). For an archive of millions of small files that effect is visible on the invoice as billed-GB far higher than stored-GB. Deep Archive has a 180-day minimum storage duration and no Expedited retrieval tier – delete-early and need-it-in-three-hours are both expensive surprises. A realistic cost picture has to price these in, not just the headline $/GB number.

And the sixth factor is a softer one: predictability of the pattern itself. This isn’t a workload where the access shape is going to drift quarter-to-quarter. Regulators will keep asking for the same shape of sample; the retention clock ticks predictably; deletions happen at year seven not year three. The more confident we are that the pattern won’t change, the safer it is to pick the cheapest class that fits – because the expensive classes aren’t buying us flexibility we need, they’re buying latency we don’t.

What we’ll filter on

Distilling the exploration into filters:

Lowest $/GB/month – storage dominates the seven-year bill.
Retrieval ≤ 24 hours – anything faster is overkill; anything slower violates the audit SLA.
Multi-AZ durability – single-AZ classes are off the table for regulated data.
No per-object monitoring fee – predictable access patterns don’t benefit from a watchdog.
Survives the small-object trap – if the archive contains millions of tiny files, the 128 KB minimum chargeable size can double-digit-multiply the apparent storage.

The S3 storage-class landscape

S3 ships eight storage classes. Each optimises a different point on the cost / latency / durability curve.

S3 Standard. ~$0.023/GB/month. Immediate access, no retrieval fees, no minimums. The correct home for active workloads. For 50 TB over seven years, about $96,600 in storage alone – most of which buys us nothing.
S3 Intelligent-Tiering. Frequent Access matches Standard; 30 days of no access moves objects to Infrequent Access pricing; 90 days to Archive Instant Access; opt-in tiers extend further. No retrieval fees between automatic tiers. The catch: a monitoring fee of ~$0.0025 per 1,000 objects per month. For a predictable archive, we’re paying the watchdog for information we already have.
S3 Standard-IA. ~$0.0125/GB/month. 30-day minimum storage. 128 KB minimum chargeable size. Per-GB retrieval fee (~$0.01/GB). For “infrequent but not cold” data read monthly or so – an order of magnitude more expensive than the deepest Glacier tier.
S3 One Zone-IA. ~$0.01/GB/month. Same minimums as Standard-IA but stored in a single AZ. 20% cheaper than Standard-IA. Non-starter for regulated data.
S3 Glacier Instant Retrieval. ~$0.004/GB/month. Millisecond retrieval, 90-day minimum, 128 KB floor, per-GB retrieval fee. Designed for archive that must be returned immediately when asked – quarterly dashboards, medical images, catalogues that surface in a UI. The 24-hour window makes “millisecond” an overpay.
S3 Glacier Flexible Retrieval. ~$0.0036/GB/month. 90-day minimum. Three retrieval tiers: Expedited (1-5 min, premium fee), Standard (3-5 hours, per-GB fee), and Bulk (5-12 hours, free). Within budget, but Deep Archive is cheaper still and the 24-hour window doesn’t reward the flexibility.
S3 Glacier Deep Archive. ~$0.00179/GB/month. 180-day minimum. Two retrieval tiers (no Expedited): Standard (12 hours) and Bulk (48 hours, free). Purpose-built for data we’ll touch once or twice a year or never.
S3 Express One Zone. ~$0.16/GB/month. Sub-millisecond latency, single-AZ, optimised for compute-adjacent high-IOPS workloads. The opposite of an archive class.

Side by side

Storage class	Lowest $/GB	Retrieval ≤ 24 h	Multi-AZ	No per-object monitoring
S3 Standard	✗	✓	✓	✓
Intelligent-Tiering	–	✓	✓	✗
Standard-IA	✗	✓	✓	✓
One Zone-IA	✗	✓	✗	✓
Glacier Instant Retrieval	✗	✓	✓	✓
Glacier Flexible Retrieval	–	✓	✓	✓
Glacier Deep Archive	✓	✓	✓	✓
Express One Zone	✗	✓	✗	✓

Matching the workload to the class

Slide down the cost axis as far as the retrieval SLA allows. Deep Archive is the first class cheaper than Standard-IA whose worst-case (12 h) fits inside 24 h with the correct tier selected.

Glacier Deep Archive, in depth

What makes Deep Archive cheap is what makes it slow. Objects aren’t on warm disk – they’re on tiered media optimised for sequential access, and a retrieval is a job AWS schedules against tape-class hardware. The 12-hour Standard retrieval window is the SLA AWS commits to; in practice many retrievals come back faster, but that’s not something to depend on.

The retrieval tiers. Standard costs roughly $0.02/GB and completes within 12 hours. Bulk is free and completes within 48 hours. There is no Expedited tier for Deep Archive – that’s exclusive to Flexible Retrieval. Anything faster than 12 hours means picking a different class.

The restore shape. RestoreObject doesn’t move the Deep Archive object permanently. It creates a temporary copy at S3 Standard rates for a number of days we specify (1-365). During that window the object is readable like any Standard object; when the window closes, the copy disappears and the Deep Archive object remains. The bill includes both the base storage and the temporary copy for the restore duration.

The 180-day minimum. Deleting before 180 days incurs a charge equal to the remaining days at the Deep Archive rate. For seven-year data this is irrelevant. For a workload that might delete at 60 days, Deep Archive is a trap.

The 128 KB minimum chargeable size – the most common trap. Glacier classes charge 128 KB per object as a floor, plus ~40 KB of metadata stored at S3 Standard rates so the object stays searchable. For an archive of millions of small files – 50 million 5 KB notification emails, say – the 128 KB floor inflates the apparent storage from 250 GB to 6.4 TB for billing purposes. The mitigation is to pack small objects into larger archives (zip, tar, or a custom format) before upload. A million 5 KB emails packed into 100 MB archives reduces the chargeable size by orders of magnitude.

A worked example: one year of bill shape

50 TB = 50,000 GB. Assume objects are packed large enough that the 128 KB minimum doesn’t dominate.

Storage at Deep Archive
  50,000 GB × $0.00179 × 12                        =  $1,074

Retrieval (twice a year, 50 TB returned per audit)
  Standard retrieval: 50,000 × $0.02 × 2           =  $2,000

Restore staging (14-day window each audit)
  50,000 × $0.023 × (14/30) × 2                    =  $1,073

Total annual                                          $4,147

Comparable S3 Standard bill:
  50,000 × $0.023 × 12                             =  $13,800
Saving                                                ~70%

Over seven years that’s roughly a $67,000 difference, paid for by tolerating a 12-hour retrieval window that the 24-hour audit SLA covers with a full day of headroom. If the auditor would accept a 48-hour window, Bulk retrieval drops retrieval cost to zero and the annual bill lands near $2,150 – but the scenario’s 24-hour notice doesn’t tolerate Bulk’s worst case.

Getting data into Deep Archive

Three routes in, with different cost shapes.

Direct PUT to Deep Archive. Set x-amz-storage-class: DEEP_ARCHIVE on the upload request. The object lands directly in Deep Archive – no Standard hop, no transition fee. Requires the uploader to know about the storage class. Best when the data is born archival.

Lifecycle transition from Standard. Upload to Standard, then a lifecycle rule moves objects to Deep Archive after a configured number of days. Transition costs roughly $0.05 per 1,000 objects. For 50 million objects that’s $2,500 one-off – real money but amortised.

LifecycleConfiguration:
  Rules:
    - Id: ArchiveImmediately
      Status: Enabled
      Filter:
        Prefix: kyc/
      Transitions:
        - Days: 0
          StorageClass: DEEP_ARCHIVE

A Days: 0 transition fires within the next lifecycle evaluation cycle (once per day), so it’s not instant – there’s typically a one-day window where the object sits in Standard first. For workloads where that day matters, prefer direct PUT.

Intelligent-Tiering with Deep Archive Access opt-in. Let S3 discover the access pattern; untouched objects eventually reach Deep Archive Access after 180+ days of inactivity. Plus the monitoring fee per object. For a workload where the pattern is already known, overkill.

What’s worth remembering

Eight S3 storage classes exist, each optimising a different point on cost / latency / durability / AZ-count. For a seven-year compliance archive touched twice a year, Deep Archive is the one designed for the shape.
The 128 KB minimum chargeable object size on IA and Glacier classes is the most common bill-shock trap. Pack small files into larger archives before upload.
Glacier Flexible Retrieval has Expedited (1-5 min); Deep Archive doesn’t. If anything faster than 12 hours is required, pick Flexible, not Deep.
Minimum storage durations matter. 30 days on IA, 90 on Glacier Instant / Flexible, 180 on Deep Archive. Early deletion charges the remaining days.
RestoreObject creates a temporary S3 Standard copy for the duration specified. The underlying Glacier object is unchanged; the bill includes both for the restore window.
Intelligent-Tiering’s monitoring fee is per object, not per GB. Predictable-access archives don’t benefit from a watchdog, and billions of small objects make the watchdog expensive.
Lifecycle transitions have per-1,000-object costs that matter at scale. Direct PUT skips them.
Single-AZ classes share the 11-nines durability but drop availability. For regulated data the availability gap is disqualifying even when durability is fine on paper.