The situation
A product team runs three services in front of the same relational database and the SRE partner has asked the question every SRE partner eventually asks: can we take the database read load down without losing anything we care about?
- A catalogue API: serves 3,000 requests per second of
GET /products/:id. The underlying row changes maybe twice a day when the merchandiser updates copy or price. The API’s p95 latency is 90ms, most of it spent in a ten-way join behind the row. Reads dominate writes by a thousand to one. - An inventory service: keeps a per-SKU
available_quantity. The warehouse decrements it as orders fulfil and increments it as returns arrive. Reads and writes are roughly balanced. A stale read by five seconds means a customer occasionally sees “4 in stock” when there are 3, acceptable for display, unacceptable for the checkout guardrail. - A leaderboard service: computes the top-100 scores for the last 24 hours. Writes arrive at 500 per second; reads at 100 per second. The leaderboard page must reflect a score within one second of the game reporting it.
Everything today runs against the primary RDS instance at roughly 70% CPU. The question isn’t whether a cache helps, it does, it’s which caching pattern fits each workload’s read-write shape without introducing bugs the team will be chasing for months.
What actually matters
Before reaching for ElastiCache, it’s worth asking what we’re actually trading.
The core trade in caching is freshness against load. A cache lowers load on the underlying store by answering reads from memory. The cost is that the cached value can diverge from the underlying store between the moment it was cached and the moment it’s read. Every caching pattern is a different answer to the question how much staleness is tolerable, and who pays to reduce it?
The first thing to ask is: on a cache miss, who fills the cache? Some patterns make the application responsible, read the cache, miss, read the database, write the cache, return. Others make the cache library responsible, the application asks the cache; the cache fetches from the database itself. The former gives the application full control over key shape and serialisation; the latter gives a cleaner programming model at the cost of a tighter coupling.
The second thing to ask is: on a write, what happens to the cached value? We can invalidate (delete the entry so the next read misses), update (write the new value to both cache and database), or write to the cache and let it flush to the database asynchronously. Each choice changes the window during which a reader could see a stale value.
The third is consistency guarantees. Some patterns keep the cache as fresh as the last successful write; some make the cache at most TTL-seconds stale; some are eventually consistent on a flush interval of opinion. The application’s tolerance for each of those windows determines the pattern.
The fourth is what “failure” looks like. When the cache is down, does the application fall back to the database gracefully, or does the request fail? When a write succeeds to the database but fails to the cache (or vice versa), does the next reader see an old value or a missing one? Caches fail; patterns differ in how loudly they fail.
The fifth is write amplification. Some patterns double the write cost, every database write is also a cache write. Others delay it. Others avoid it entirely, leaving the cache untouched on write until the next reader misses and refills. The pattern has to match the write rate or it burns cycles at the cache just like it did at the database.
And finally, a softer one: what the application actually caches. The expensive thing isn’t always the row, sometimes it’s the rendered HTML, sometimes it’s a graph traversal, sometimes it’s a signed URL. The pattern fits the shape of the computed result, not just the shape of the row.
What we’ll filter on
Distilling that exploration into filters we can score each pattern against:
- Freshness guarantee, at most how stale can a read be?
- Miss handling, who reads the database, application or cache?
- Write cost, what does a single write do to the system?
- Failure behaviour, what happens if the cache is unreachable?
- Complexity in the application, how much logic sits in the code path?
The pattern landscape
-
Cache-aside (lazy loading). The application reads the cache first; on miss, reads the database, writes the value into the cache with a TTL, and returns. On writes, the application either does nothing to the cache (letting the TTL expire the stale value) or explicitly invalidates the key. The cache holds only values that have actually been requested; cold starts are slow because the cache is empty; a cache outage falls back to the database. Freshness is bounded by the TTL and by how aggressively writes invalidate. The default pattern in Redis/Memcached codebases because it’s the simplest.
-
Write-through. The application writes to the cache, and the cache writes synchronously to the database. Every write completes in both places or not at all. Reads are always served from the cache, which is always at most as stale as the last write. The cost: write latency doubles (cache + database), and every write hits the cache regardless of whether anybody ever reads that entry again. Useful for workloads where reads dominate and freshness is paramount.
-
Write-behind (write-back). The application writes to the cache; the cache acknowledges immediately and flushes to the database asynchronously. Write latency drops to in-memory speed; throughput climbs; the cost is that the database is eventually consistent and a cache-node failure between accept and flush loses writes. Most AWS-managed caches don’t expose a first-class write-behind mode; the pattern typically shows up as application-level buffering in front of a cache-aside store, not as a checkbox in ElastiCache.
-
Read-through. The application asks the cache for a value; on miss, the cache itself reads the database, populates, and returns. Structurally similar to cache-aside but with the fetch logic inside the cache layer rather than in the application. Cleaner programming model, tighter coupling between cache and database. DAX in front of DynamoDB is the canonical AWS shape; for ElastiCache Redis/Valkey, the equivalent is usually a wrapper library rather than a native feature.
-
Refresh-ahead. The cache proactively refreshes entries before they expire, based on usage prediction. Keeps hot entries fresh without the miss stall that cache-aside produces when TTLs expire under load. Harder to tune; useful for read-heavy workloads with predictable access patterns.
Side by side
| Pattern | Freshness | Who reads DB on miss | Write cost | If cache is down | App complexity |
|---|---|---|---|---|---|
| Cache-aside | Bounded by TTL | Application | DB only (+ optional invalidate) | Fallback to DB | Moderate |
| Write-through | Last write | Application | DB + cache, synchronous | Writes fail | Low |
| Write-behind | Eventually (flush interval) | Application | Cache only (async to DB) | Writes lost | High |
| Read-through | Bounded by TTL | Cache | DB only | Reads fail | Low |
| Refresh-ahead | Near-real-time for hot keys | Cache (pre-emptively) | DB only | Reads fall back | High |
Reading the table by workload rather than by pattern:
- Catalogue API, read-heavy, rarely-changing rows, tolerates minutes of staleness. Cache-aside with a 10-minute TTL and explicit invalidation on merchandiser update. Load on RDS drops by orders of magnitude; the cost is a tiny amount of application-level glue.
- Inventory service, balanced reads and writes, freshness matters at checkout but not at display, writes must not be lost. Write-through gives the display-read side a cache that’s always current with the last write; the checkout path bypasses the cache and reads the authoritative database row with
SELECT ... FOR UPDATE. - Leaderboard, writes dominate reads, per-key latency on read is the point, the underlying dataset is exactly the kind of thing Redis sorted sets are built for. This is less “cache in front of a database” and more “the database for this feature is Redis, persisted periodically back to the relational store for reporting.” A Redis-first pattern, which we’ll look at separately.
Choosing a pattern per workload
The picks in depth
Catalogue API → cache-aside with TTL and explicit invalidation. The happy path in code is four lines: read the cache, on miss read the database, set the cache entry with a TTL, return. The staleness bound is the TTL; pick the TTL long enough to move load off the database (a few minutes is usually plenty for a rarely-changed row) and add an explicit invalidation on the write path (the merchandiser’s CMS calls DEL product:{id} after updating the row). The cache then behaves as if the TTL were infinite in the common case, with the TTL as a safety net for missed invalidations.
Key design matters. Include the API version in the key (v2:product:{id}) so a schema change doesn’t accidentally serve old shapes to new code; include the Region or tenancy if the data is partitioned; don’t include request-specific things like user or locale if the payload doesn’t actually vary by them, or the hit rate collapses. Serialisation matters too: JSON is legible but wastes bytes on well-known schemas; MessagePack or protobuf can halve the memory footprint at the cost of debuggability.
The failure story is graceful: on cache outage, the application falls back to the database, which suddenly sees 3,000 RPS again. That’s a spike the primary might not survive, which is why the autoscaling group in front of the database needs enough headroom to handle a cache failure, and why the cache cluster itself is multi-AZ with automatic failover.
Inventory service → write-through on display, direct-to-database on checkout. The display side benefits from a cache because the read volume is real and the freshness guarantee is weak enough that an application-managed write-through is cheap: on every write, update the cache and the database together. On read, always go to the cache. The worst-case freshness is “the last successful write,” which is exactly what the display path needs.
The checkout path is different. “Is there stock?” at the moment of placing an order cannot be answered by a cache, because the cache lags even by milliseconds and the cost of an over-sell is a customer-facing failure. The checkout path reads the authoritative row in the same transaction that writes the order, using a SELECT ... FOR UPDATE to lock it, and writes back through the cache as part of the same code path. Display is a cache lookup; checkout is a database lookup; the cache is not on the correctness path.
The write-through has a failure mode worth naming. If the database write succeeds and the cache write fails, the cache now has an older value than the database. The simplest remedy is to invalidate the cache key in the application after every successful write, rather than setting it, a delete-then-populate-on-next-read behaviour that converges to the correct value at the cost of an extra miss. For inventory, the extra miss is negligible.
Leaderboard → Redis as the system of record. The leaderboard isn’t really a cache problem; it’s a data-structure problem. The operations the feature needs, “insert a score,” “read the top 100”, are ZADD and ZREVRANGE on a Redis sorted set, both O(log N). The relational database would need a b-tree index, a partial materialised view, or a lot of pain to do the same thing at the same latency.
Configure ElastiCache with AOF persistence (append-only file) and Multi-AZ so a node failure doesn’t lose the last few minutes of scores. Dump the sorted set to the relational store on a schedule for reporting, archive, and compliance, that’s “eventual consistency for reporting,” which is the correct level of freshness for reporting. The primary is Redis; the relational database is the archive.
A small note on ElastiCache flavours: the team should be running Valkey or Redis OSS engine; the engine choice affects licensing and feature parity, but the patterns themselves (cache-aside, write-through, sorted sets, AOF) are the same. Memcached is on the same ElastiCache service but is less suited for any of these three workloads, no sorted sets, no persistence, simpler eviction.
A worked sequence: one second of the catalogue API
3,000 requests land in a second; the cache is warm:
t=0ms 3000 × GET /products/:id arrive
t=1ms app reads Redis: 2982 hits, 18 misses
t=2ms 2982 responses returned (p50 < 2ms end-to-end)
t=3ms 18 misses → app reads RDS
t=8ms 18 responses returned, 18 × SET product:{id} EX 600
RDS sees 18 read queries instead of 3,000, a 99.4% offload. Memory on the cache is dominated by the product catalogue’s hot set, which is maybe 100k items at a few KB each (a few hundred MB, easy to size). The TTL refreshes on miss; a product that was read once an hour ago and once now is already an hour into its ten-minute TTL window, so the second read is itself a miss, which is healthy, cold products don’t take up memory indefinitely.
What’s worth remembering
- Every caching pattern is an answer to “how stale can a read be?” Pick the pattern whose staleness bound matches the feature’s tolerance.
- Cache-aside is the default. Application reads the cache, falls back to the database, populates on miss. Simple, resilient, bounded by TTL and by explicit invalidation.
- Write-through is for read-dominant workloads where freshness-on-write matters. Every write updates both stores; every read hits the cache; the cost is doubled write latency.
- Write-behind shortens write latency at the cost of durability. Rare in AWS-managed shape; usually an application-level pattern if used.
- Read-through pushes the fetch logic into the cache layer. DAX is the clean AWS example; for ElastiCache Redis it’s typically a wrapper rather than a native feature.
- TTL plus explicit invalidation is the correct default combination. TTL is the safety net for missed invalidations; explicit invalidation is the quality-of-service lever for normal operation.
- Cache-outage behaviour matters as much as cache-hit behaviour. The database has to be sized to survive the cache going away; the cache has to be Multi-AZ if the workload can’t.
- Sometimes the cache is the primary. Sorted sets, rate-limit counters, session stores. Redis is the correct system of record, not the cache in front of one.
Cache-aside sits in front of read-heavy rarely-changing rows; write-through sits in front of balanced reads and writes where freshness-on-write matters; Redis-as-primary shows up when the feature is what Redis was built for. Three read-write shapes, three patterns, three matches. The work isn’t picking a favourite, it’s pairing each workload with the pattern that matches its shape.