CloudFront Caching for Mixed Static and Personal Content

September 14, 2026 · 16 min read

Solutions Architect · SAA-C03 · part of The Exam Room

The situation

Everything today lives behind one ALB in eu-west-1:

  • A marketing shell, homepage, category landing pages, a few hundred assets. Rebuilt by CI twice a month. Identical bytes for every visitor.
  • A product catalogue API under /api/catalogue/*. JSON responses built from a database, refreshed on a schedule. Same response for every visitor for about an hour at a time.
  • A signed-in account area under /account/*, order history, saved addresses, loyalty points. Different per visitor. Uses a session cookie and an Authorization header.
  • Static product images under /img/*. JPEG and WebP, served today from S3 via the ALB. Never change once written; renamed when updated.

The ALB handles roughly 1,200 requests per second at peak, of which something like 70% is images, 20% is catalogue, 8% is marketing HTML, and 2% is account. The bill is unhappy about that 70%, and the p95 for the catalogue API is around 240 ms because every hit traverses the ALB and a warm-enough application.

The ask: put CloudFront in front, keep one domain, and get the cache hit ratio somewhere useful without ever, not once, serving a signed-in visitor someone else’s order history.

What actually matters

Before reaching for cache behaviours, it’s worth naming what “cacheable” actually means for each path.

The marketing shell is the easiest case: bytes that are identical for every visitor, change on a human schedule, and have deterministic URLs. This is the content CloudFront was designed to serve, long TTLs, aggressive edge caching, invalidation on release. The only question is how long is long, and the answer is roughly “as long as the renaming scheme allows”, which for a build with content-hashed filenames is effectively forever.

The product catalogue is the interesting middle case. A response to GET /api/catalogue/widgets?page=2 is identical for every visitor for an hour, but the URL has a query string, and visitors will occasionally want a cache bypass (price update, stock change). CloudFront can cache query-stringed responses, the cache key includes whichever query parameters we tell it to include, but we have to be specific about which ones actually vary the response, or we’ll cache as many variants as there are unique strings and hit nothing. The TTL wants to match the refresh cadence: one hour on the edge, with a max-age from the origin that lines up, and an invalidation hook when the catalogue job finishes early.

The account area is the case where caching kills us. Two visitors hitting /account/orders must never see each other’s orders. The safe move is not “cache for a very short time”, it’s “do not cache at all”, because any non-zero TTL is a window in which the wrong bytes can be served to the wrong person. The distribution needs a way to express “always pass through to origin” while still forwarding the cookies and headers the application needs to identify the visitor.

The product images sit in a fourth category: genuinely static, genuinely immutable, but served today from the load balancer because nobody ever moved them. Serving them directly from object storage via the distribution gets them off the load balancer’s plate entirely. The cost and latency improvements are mechanical.

And there’s a cross-cutting concern: origin selection. One distribution can have multiple origins and route between them by path. The path-based routing lives in ordered, first-match-wins rules, with a default rule as the fallback.

Finally, what goes in the cache key is the decision that makes or breaks hit ratio. Every cookie, every header, every query parameter forwarded to the cache key multiplies the variants. Forwarding Authorization on the account path guarantees a miss every time (which is correct). Forwarding User-Agent on the marketing shell accidentally shards the cache by browser version, which is catastrophic and silent.

What we’ll filter on

  1. Cacheability, can two different visitors share the same response?
  2. TTL horizon, how long can a cached response serve before it’s wrong?
  3. Origin type. S3 object, ALB target, or something else?
  4. Cache key inputs, which query strings, headers, and cookies legitimately vary the response?
  5. Invalidation story, how does a change at origin reach visitors faster than the TTL?
  6. Security posture, signed URLs, WAF, origin access control, or none?

The behaviour landscape

  1. Default behaviour (marketing shell → S3 marketing bucket). Path pattern Default (*). Origin: the marketing bucket via Origin Access Control so the bucket stays private and only the distribution can read it. Cache policy: CachingOptimized (AWS managed), which uses Accept-Encoding and Accept-Encoding + Host as the only cache key inputs and respects origin Cache-Control. Origin request policy: CORS-S3Origin or none, depending on whether cross-origin assets matter. TTL: origin sends Cache-Control: public, max-age=31536000, immutable on hashed assets, max-age=300 on the HTML shell. Compress objects automatically.

  2. /api/catalogue/* → ALB, cached. Path pattern /api/catalogue/*. Origin: the ALB, HTTPS only, with an origin custom header X-Origin-Verify: <shared-secret> that the ALB’s listener rule requires, so direct-to-ALB requests bypass CloudFront with a 403. Cache policy: custom, named CatalogueCache, cache key includes page, category, sort query string parameters and nothing else; TTL 3600 s edge, min TTL 0 so origin can shorten with Cache-Control: max-age=300 if it wants. Origin request policy: AllViewerExceptHostHeader is too broad; a custom policy forwarding only the query-string allowlist keeps the origin request small.

  3. /account/* → ALB, not cached. Path pattern /account/*. Origin: the ALB. Cache policy: CachingDisabled (AWS managed), which sets min, max, and default TTL to 0. Origin request policy: AllViewer, forward every cookie, every header, every query string, so the application can authenticate the request exactly as it would without CloudFront in front. Response headers policy: strict no-store – Cache-Control: no-store, private, belts and braces against any intermediate cache misreading the response. Viewer protocol: HTTPS only.

  4. /img/* → S3 images bucket. Path pattern /img/*. Origin: the images bucket via OAC. Cache policy: CachingOptimized. The URLs are content-addressed (/img/sha256/abc123.webp), so a year-long TTL is safe. No origin request policy beyond the S3 basics. This is the behaviour that eats the biggest fraction of the bill and the biggest fraction of the latency.

  5. /admin/* (future) → ALB, allowlist-restricted. Not live today, but worth naming: admin traffic goes through the same distribution, behind an AWS WAF web ACL attached to the distribution with an IP allowlist rule and CachingDisabled. One distribution, one web ACL, one place to audit.

  6. Everything weird (default fallback). The default behaviour catches any path not matched by the specific ones. If a new path shows up and we haven’t decided on a cache policy for it, it falls through to the marketing-shell behaviour, which will serve from the wrong origin and return 404. The default is a safety net; the specific behaviours earn their place.

Side by side

Behaviour Cacheable TTL Origin Cache key Policy
Marketing shell (default) 5 min HTML, 1 yr assets S3 (OAC) Accept-Encoding CachingOptimized
/api/catalogue/* 1 h ALB allowlisted query params CatalogueCache (custom)
/account/* 0 ALB n/a CachingDisabled
/img/* 1 yr S3 images (OAC) Accept-Encoding CachingOptimized

Reading the table: the distribution is a router as much as a cache. Each behaviour answers two questions, where does this path go, and under what cache policy, and the answer for /account/* is deliberately the boring “nowhere cacheable”, because the cost of a wrong answer there dwarfs the cost of every cache miss on the API path for a year.

The path routing diagram

Viewer shop.example.com CloudFront distribution (E1ABCD…) 4 cache behaviours, ordered, first-match wins HTTPS only, TLS 1.2+, compression on, HTTP/2 + HTTP/3 web ACL: rate-based rule + managed Core Rule Set /img/* CachingOptimized TTL 1 year content-hashed URLs Default (*) CachingOptimized 5 min HTML / 1 yr assets marketing shell /api/catalogue/* CatalogueCache (custom) TTL 1 hour page, category, sort only /account/* CachingDisabled TTL 0, never cache AllViewer origin request S3 images bucket shop-images (private) OAC → bucket policy immutable objects S3 marketing bucket shop-marketing (private) OAC → bucket policy invalidated on release Application Load Balancer shop-alb.eu-west-1.elb.amazonaws.com listener rule requires X-Origin-Verify header catalogue service + account service Green = cacheable path; red-dashed = pass-through (TTL 0). Private origins reachable only via the distribution.
One distribution, four behaviours, three origins. The cache policy on each behaviour answers the question "can two visitors share this?" before it answers "how long?"

The cache key, in depth

The cache policy on each behaviour is really a specification for what goes into the cache key. The policy has three inputs, query strings, headers, and cookies, and for each, the choices are None, AllViewer (pass everything through), Whitelist (named list), or one of a handful of managed policies.

For /api/catalogue/*, the cache key specification looks like this:

CachePolicy: CatalogueCache
  MinTTL: 0
  DefaultTTL: 3600
  MaxTTL: 86400
  ParametersInCacheKeyAndForwardedToOrigin:
    EnableAcceptEncodingGzip: true
    EnableAcceptEncodingBrotli: true
    QueryStringsConfig:
      QueryStringBehavior: whitelist
      QueryStrings: [page, category, sort]
    HeadersConfig:
      HeaderBehavior: none
    CookiesConfig:
      CookieBehavior: none

Three query parameters included, nothing else. If a visitor appends ?page=2&category=widgets&sort=price&utm_source=email, the utm_source is stripped from the cache key, so the email-tracked link and the organic link share a cache entry. If the origin Cache-Control: max-age=300 is honoured, CloudFront uses 300 s instead of the default 3600; origin wins when it’s more conservative.

For /account/*, the origin request policy is the opposite posture, pass everything, but the cache policy is CachingDisabled, so nothing is actually stored:

CachePolicy: CachingDisabled (AWS managed)
  MinTTL: 0, DefaultTTL: 0, MaxTTL: 0

OriginRequestPolicy: AllViewer (AWS managed)
  Headers: AllViewer (includes Authorization, Cookie)
  QueryStrings: All
  Cookies: All

The two are paired because AllViewer in the origin request policy forwards headers and cookies to the origin without including them in the cache key. The origin sees the full request; nothing is cached. This is the canonical “pass-through” pattern.

One subtlety worth calling out: Authorization is never a valid cache key input. CloudFront’s managed policies deliberately exclude it, and a custom policy that included it would still effectively disable caching, because the header is unique per session. The correct way to “cache per authenticated user” is not to cache at all, unless the application explicitly serves responses with Cache-Control: private, max-age=N and the infrastructure in front respects the private directive.

A worked trace: four requests, four paths

Debugging cache hit ratios. Four requests run through the distribution, one for each behaviour. Every x-cache response header comes from the edge; age is the number of seconds the entry has lived.

$ curl -I https://shop.example.com/
HTTP/2 200
content-type: text/html
cache-control: public, max-age=300
age: 187
x-cache: Hit from cloudfront
x-amz-cf-pop: LHR50-P4

$ curl -I https://shop.example.com/img/sha256/d8a9f2.webp
HTTP/2 200
content-type: image/webp
cache-control: public, max-age=31536000, immutable
age: 4102
x-cache: Hit from cloudfront
x-amz-cf-pop: LHR50-P4

$ curl -I "https://shop.example.com/api/catalogue/widgets?page=2&category=blue&utm_source=email"
HTTP/2 200
content-type: application/json
cache-control: max-age=3600
age: 412
x-cache: Hit from cloudfront
x-amz-cf-pop: LHR50-P4

$ curl -I https://shop.example.com/account/orders \
    -H "Authorization: Bearer eyJhbGc..." -H "Cookie: session=abc123"
HTTP/2 200
content-type: application/json
cache-control: no-store, private
x-cache: Miss from cloudfront
x-amz-cf-pop: LHR50-P4

Three hits, one miss. The account request is a miss because the behaviour is CachingDisabled, every request for /account/* misses, which is the correct behaviour. The catalogue request is a hit despite the utm_source tracker, because the cache key only includes page, category, sort. The image and marketing responses are hits against long TTLs.

Hit ratio on this distribution should sit in the 80-85% range at peak, driven by the image traffic. The Miss on /account/* shows up in CloudFront logs as a pass-through with x-edge-result-type: Miss but no Hit will ever follow for that path, the policy guarantees it.

Invalidation, and when not to invalidate

Invalidation is how content at origin gets to visitors faster than the TTL would normally allow. CloudFront charges for invalidation paths after the first 1,000 per month, and the operation is eventually consistent: a CreateInvalidation call returns immediately, and the edge locations drop the entry within a few minutes.

The marketing shell uses invalidation on release: CI runs aws cloudfront create-invalidation --distribution-id E1ABCD --paths "/index.html" "/categories/*", which is a handful of paths per release. The hashed assets never need invalidation, their URLs change when the content changes.

The catalogue API mostly doesn’t need invalidation. The 1-hour TTL means the worst case for a stale price is 59 minutes, and the refresh job running hourly keeps origin in sync. For the rare early-update case (a launch, a recall), a targeted invalidation of /api/catalogue/widgets* clears the relevant keys without touching the rest.

The account area cannot be invalidated, it isn’t cached. The operational lever is the Cache-Control: no-store, private header on responses, reinforced by the distribution’s cache policy.

What’s worth remembering

  1. One distribution, many behaviours. A single CloudFront distribution can route paths to multiple origins via cache behaviours, ordered first-match. The default behaviour is the fallback; specific behaviours are the decisions.
  2. Cache policy, origin request policy, response headers policy. Three separate policies per behaviour. The cache policy defines the cache key; the origin request policy defines what the origin sees; the response headers policy defines what the viewer sees. Mixing them up is where most subtle caching bugs live.
  3. Cacheable traffic wants minimal cache key inputs. Every additional query string, header, or cookie fragment multiplies variants. Whitelist the parameters that legitimately vary the response; strip the rest.
  4. Uncacheable traffic wants CachingDisabled + AllViewer. Not “very short TTL”, not “cache with session key”. Zero TTL plus pass-through forwarding is the only correct shape for authenticated per-visitor responses.
  5. Authorization is never a cache key. Managed policies deliberately exclude it. Custom policies that include it still don’t cache usefully. Authenticated endpoints go through CachingDisabled.
  6. Origin Access Control locks S3 origins to the distribution. The images and marketing buckets are private; only the distribution can read them. Public buckets behind CloudFront are a historical accident; OAC is the current answer.
  7. Invalidation is a release hook, not a per-request tool. Hashed asset filenames eliminate invalidation for static bundles. Targeted path invalidation is for the HTML shell and the occasional API override. Frequent broad invalidations are a sign the TTL is wrong.
  8. Hit ratio lives in the logs. CloudFront access logs and the CacheHitRate CloudWatch metric, broken down per behaviour, tell you which cache policy is earning its keep. A behaviour with 0% hit ratio is either uncacheable (correct) or mis-configured (fix it).

One domain, four behaviours, three origins. The distribution answers two questions on every request, where does this go, and can it share a cached copy, and the answers are the difference between a 240 ms p95 on the ALB and a 15 ms p95 at the edge for most of the traffic.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.