How to Tier OpenSearch Shards Across Hot, Warm, and Cold

December 22, 2027 · 16 min read

Data Engineer · DEA-C01 · part of The Exam Room

The situation

Our observability cluster holds 730 days of application logs across 50 services. Three years ago we set a retention policy at two years and haven’t revisited it. The index pattern is daily – app-logs-2027.11.27, app-logs-2027.11.28, …, with about 40 GB per index. Total: ~30 TB, all on hot nodes, on r6g.2xlarge.search instances with attached EBS gp3 storage.

A typical day’s queries:

  • Support troubleshooting, last 7 days, several hundred queries a day. Filters by service, trace ID, or user ID; aggregations over time to spot error spikes. Latency matters; support is on a live call.
  • SRE post-mortems, last 30 days, few dozen queries a week. Similar shape, a bit more aggregation-heavy.
  • Security investigations, occasionally reach back 90 days, sometimes further. Maybe 5-10 deep queries a month.
  • Compliance and audit, reach back up to 2 years. A handful of queries a quarter, usually very targeted (customer ID, timestamp range).
  • Analytics dashboards, last 24 hours only, continuously. Refreshes every minute, handful of named queries.

The access pattern is heavily skewed toward recent. The last 7 days see 90% of queries; 8-30 days see 9%; 30-730 days see 1%. But all 730 days are stored on the same expensive tier as today’s, with the same hot-node compute attached.

The bill is dominated by hot storage we’re not using. Tiered storage exists exactly for this shape.

What actually matters

Before reaching for a tiering plan, it’s worth asking what “tiered” actually buys and what it costs.

Tiered storage separates storage from compute at each tier boundary. The active tier holds indices on node-local storage so every node is serving both queries and indexing. An intermediate tier detaches storage from node-local and puts it on cheap object storage, with a separate compute fleet that caches portions locally when queried. An archival tier keeps indices on object storage without any compute attached at all; the index has to be re-attached before it can be queried, a minutes-long operation.

The trade-off at each boundary:

Active to intermediate. Storage cost drops significantly. Query latency rises, first-time queries have to fetch shards from object storage into the intermediate-tier cache, which takes seconds to minutes depending on shard size. Indexing stops being available on the index, the intermediate tier is read-only. In exchange, the active tier carries only the indices being actively written and frequently queried, which means smaller and cheaper active capacity.

Intermediate to archival. Storage cost drops further. Query latency rises to “wait while the index is re-attached”, archival indices aren’t queryable without promoting them back to the intermediate tier first. In exchange, retention becomes cheap enough to hold years rather than months, which matters when compliance sets the retention clock rather than operational needs.

The decision at each boundary is the same shape: at what age does this data stop being queried frequently enough to justify active-tier cost? And at what age does it stop being queried at all for practical purposes? If the answer is “active for 7 days, intermediate for 30, archival forever” we pay roughly for seven days of active, thirty of intermediate, and object-storage rates for the rest.

The adjacent consideration is index lifecycle automation. Moving indices between tiers manually at 730 indices is untenable. A policy engine, conditions (age, size, doc count), actions (rollover, move down a tier, delete), transitions between states, has to author the rules once and let the cluster enforce them continuously. Without that automation the tiering plan is theoretical.

What we’ll filter on

Distilling that into filters we can score each tier against:

  1. Query latency expectation, sub-second, seconds, or “wait to be loaded”?
  2. Indexing availability, can new documents be written?
  3. Storage cost profile, attached EBS rates, UltraWarm rates, or S3 rates?
  4. Query access, immediate or requires restore/attach, can a user query the index without an operator action?
  5. Fit for the access pattern, heavy-query, occasional-query, or archival?

The tiered landscape

  1. Hot tier. Indices stored as traditional shards across data nodes with attached EBS. Full read and write. Sub-second query latency for typical searches. Cost is the data nodes’ instance hours plus EBS storage rates. The default, and where an index starts its life.

  2. UltraWarm tier. Indices moved to S3-backed storage with a dedicated fleet of UltraWarm nodes (separate instance pool from hot data nodes). Read-only, no new documents can be indexed. Query latency higher than hot, first-query latency can be several seconds to minutes as shards fetch from S3 into UltraWarm’s cache; subsequent queries are faster. UltraWarm nodes come in specific sizes (ultrawarm1.medium.search, ultrawarm1.large.search) with fixed storage-to-compute ratios. Storage cost is S3-backed and substantially cheaper per GB than hot EBS.

  3. Cold tier. Indices stored in S3 only, no UltraWarm compute attached. Read-only. Not queryable while cold, to query, an index must be “attached” to UltraWarm, which takes a few minutes per index. Storage cost is pure S3 rates. Retained as long as needed.

  4. Snapshot + restore (adjacent). Manual snapshots to an S3 repository, either restored to a new cluster or to the same cluster as a new index. Different shape, snapshot is a backup, not a queryable tier. Useful for disaster recovery and for off-cluster retention beyond what tiered storage supports; not useful as the primary “cold” answer because every restore is a manual operation.

  5. OpenSearch Serverless (adjacent). A different product altogether, with its own pricing and architecture (OCUs instead of instance hours). Automatic scaling, no tier management; cost model is different enough that it’s a separate decision. Not a tier of the managed-cluster product.

Side by side

Tier Query latency Indexable Storage cost Query access Fits access pattern
Hot Sub-second EBS attached rates Immediate Active write + heavy read
UltraWarm Seconds first query, faster after S3-backed, much cheaper Immediate Occasional query, no new writes
Cold Attach required (minutes) S3 rates Requires attach Archival, rare query
Snapshot (repository) Restore required S3 rates Requires restore to cluster Backup / DR

Reading by data age in our log-cluster example:

  • 0-7 days → Hot. Active writes (today’s index is being indexed), heavy read load from support and dashboards. Pay hot rates; this is where the tier is earning its price.
  • 7-30 days → UltraWarm. No more writes (the index rolled over days ago), moderate read load from SRE and post-mortems. Pay UltraWarm rates; accept the first-query latency bump.
  • 30-730 days → Cold. Rare compliance and security queries. Pay S3 rates; accept the “attach before query” step for the minutes-level latency during an investigation.

The ISM policy that drives it all

Index State Management is the policy engine that moves indices between tiers on age or size triggers. A policy has states, transitions (conditions that move an index from one state to the next), and actions (what to do in each state). Attached to indices by name pattern, ISM evaluates continuously.

A log-cluster policy looks roughly like:

{
  "policy_id": "app-logs-lifecycle",
  "description": "hot 7d, warm to 30d, cold to 730d, delete",
  "default_state": "hot",
  "states": [
    {
      "name": "hot",
      "actions": [],
      "transitions": [
        { "state_name": "warm", "conditions": { "min_index_age": "7d" } }
      ]
    },
    {
      "name": "warm",
      "actions": [{ "warm_migration": {} }],
      "transitions": [
        { "state_name": "cold", "conditions": { "min_index_age": "30d" } }
      ]
    },
    {
      "name": "cold",
      "actions": [{ "cold_migration": { "timestamp_field": "@timestamp" } }],
      "transitions": [
        { "state_name": "delete", "conditions": { "min_index_age": "730d" } }
      ]
    },
    {
      "name": "delete",
      "actions": [{ "cold_delete": {} }],
      "transitions": []
    }
  ],
  "ism_template": [
    { "index_patterns": ["app-logs-*"], "priority": 100 }
  ]
}

Attached to any app-logs-* index automatically via the ism_template. The policy runs in the background; transitions happen asynchronously. The cluster’s ISM dashboard shows every index’s current state and next scheduled transition.

The tiered shape, visualised

Index lifecycle across tiers — 730-day retention, daily indices Hot day 0 → 7 data nodes + EBS read + write sub-second queries ~40 GB × 7 days 7d UltraWarm day 7 → 30 S3-backed + UltraWarm compute read-only first-query latency: seconds ~40 GB × 23 days 30d Cold day 30 → 730 S3 only, no compute attached not queryable until re-attached attach time: minutes ~40 GB × 700 days 730d Delete ISM action index removed from cluster ISM policy drives transitions automatically based on index age Applied via ism_template pattern match on app-logs-* Every transition is observable in the ISM dashboard Illustrative storage cost shape (relative, per GB-month) Hot UltraWarm (~1/5 hot) Cold (~1/20 hot) Exact ratios depend on instance choice and Region — the shape is the point
Three tiers, one ISM policy. Hot for the actively-written and heavily-queried days; UltraWarm for the occasionally-queried middle; cold for the rarely-touched long tail. Cost drops dramatically with each boundary; latency rises.

UltraWarm in depth

Node types. UltraWarm nodes are a separate instance pool from data nodes, provisioned as part of the domain configuration. Sizes are fixed bundles of compute and cache: ultrawarm1.medium.search (1.5 TB cache per node) and ultrawarm1.large.search (20 TB cache per node). You provision nodes based on how much UltraWarm storage you have; AWS publishes a target storage-per-node ratio that indicates how many nodes you need.

Migration mechanics. The warm_migration ISM action (or the equivalent API call) takes a hot index read-only, copies its segments to S3, and detaches it from hot nodes. The index appears in the _cat/indices output with warm status; queries against it route through UltraWarm nodes, which fetch segments from S3 on demand and cache them. Migration time scales with shard size.

Caching behaviour. UltraWarm nodes have local cache storage. Recently-queried segments sit in cache; cold segments get evicted. A query against an index not recently touched re-fetches from S3 (the first-query latency bump); subsequent queries against the same segments are faster. For predictable query patterns, cache hit rates are high; for one-off searches against old data, every query pays the fetch cost.

What works, what doesn’t. All standard search queries and aggregations work. Dashboards querying UltraWarm indices work. Indexing doesn’t, write requests to a warm index fail. Re-indexing from a warm source to a new hot index works (read from warm, write to new hot). Delete-by-query works (but is a write operation that requires migrating back to hot first in some cases, check current docs).

Cold in depth

Attach/detach mechanics. Cold indices are S3-stored but not attached to UltraWarm compute. To query them, an operator (or automation) issues a POST _cold/migration/<index>/_warm call, which re-attaches the index to UltraWarm. Attach time is minutes per index (depends on shard count and size). Once attached, the index behaves as an UltraWarm index. A POST _ultrawarm/migration/<index>/_cold detaches again.

Typical pattern for investigations. A security query reaches back 60 days. The operator (or automation) identifies the cold indices for the target date range, issues a bulk attach, waits a few minutes, runs the query, optionally detaches after. Cold isn’t meant for interactive use; it’s meant for indices that might need to be queried but usually aren’t, and where the attach overhead is acceptable because the query is already expected to be a deliberate action.

ISM automation for cold. The cold_migration action moves indices to cold; the warm_migration action on a cold index attaches it back to UltraWarm. Investigations can be half-automated: a Lambda triggered by a security ticket attaches the relevant cold indices, runs a pre-canned query, posts results back, and detaches. Humans never type the attach command.

A worked bill shape

30 TB total, split across tiers per the policy:

  • Hot: 40 GB/day × 7 days = 280 GB. On r6g.2xlarge.search with attached EBS gp3. Maybe 6 hot data nodes total for the workload (including headroom); each carrying indices plus overhead.
  • UltraWarm: 40 GB × 23 days = 920 GB. Fits in one ultrawarm1.medium.search node with room; call it two for HA.
  • Cold: 40 GB × 700 days = 28 TB. Pure S3 cost. No UltraWarm nodes required for cold indices.

The hot-node count drops significantly compared to the “all-hot” baseline (which would need hot nodes sized for 30 TB of storage), which is where the real saving lives, hot storage isn’t just EBS, it’s also the instance hours to serve it. UltraWarm adds its own node cost, but that’s absorbed by the roughly-five-times cheaper per-GB storage. Cold is almost free in comparison, just the S3 storage rate.

The cluster’s query service for last-7-days support use cases is faster (hot nodes are doing less, caches are warmer). UltraWarm queries add a latency bump on first access to a post-mortem query. Cold queries require deliberate attach, which is acceptable because they’re already deliberate.

What’s worth remembering

  1. Tiered storage maps to access pattern, not just data age. The question is “how often is this queried”, not “how old is this”. The ISM policy uses age as a proxy; if the real pattern differs (e.g. month-end reports pulling back 30-60 days), tune the tier boundaries accordingly.
  2. Hot / UltraWarm / cold are three separate tiers, not one gradient. Hot is full read+write sub-second. UltraWarm is read-only with first-query latency. Cold is not queryable without re-attach. Each has its own cost and capability.
  3. UltraWarm is not a backup. It’s a tier of the live cluster with S3-backed storage and cached compute. Snapshots to an S3 repository are the backup story; different mechanism, different purpose.
  4. Index State Management drives it automatically. Policy states with age/size conditions trigger warm/cold/delete actions. Apply by name pattern via ism_template; never manually move indices across tiers for a real workload.
  5. UltraWarm and cold need their own node provisioning. UltraWarm nodes are a separate fleet from hot data nodes; cold has no compute attached but requires UltraWarm to be queried. Size the UltraWarm fleet to your warm storage, not to query volume.
  6. First-query latency is real. The first query against a warm index that isn’t in cache takes seconds to minutes. Plan for it in dashboards that might touch older data; cache-warm them after a tier migration if it matters.
  7. Cold queries are deliberate. Automation for investigations (attach, query, detach) is the correct pattern; humans shouldn’t be learning the attach API under pressure.
  8. OpenSearch Serverless is a separate decision. Automatic scaling, no tier management, different pricing. If the cluster is hard to size and light on heavy-tuning work, worth evaluating; it isn’t a tier of the managed-cluster product.

Tiered storage is how a log retention policy stops being “how much hot storage can we afford” and starts being “how far back does each question reach?”. The answer is usually short for most questions, long for a few, and the cost shape matches the question shape. ISM is the automation that keeps it honest at scale; UltraWarm and cold are the tiers that absorb the data the hot cluster shouldn’t be paying for.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.