Cache Key Architecture
CDN cache keys control which requests share a cached response and which generate a separate storage entry. A misconfigured key causes cache fragmentation — multiplying origin load and reducing hit ratios — or, worse, collapses distinct versions of a file into a single cache slot so stale bytes reach users. Understanding cache key construction, content hash embedding, Vary header behaviour, and edge normalization rules is the foundation for operating fingerprinted assets at scale.
When to Use Deterministic Path-Based Cache Keys
Path-based fingerprinting — where the hash lives in the filename such as app.a1b2c3d4.js — is the correct choice whenever:
- Assets are served through a CDN that normalises or drops query strings by default.
- You need
Cache-Control: public, max-age=31536000, immutableto take full effect without per-request revalidation. - Build pipeline output is deterministic so the same source always produces the same hash.
- You want new deployments to be atomic: old hashed URLs remain live while HTML is updated.
Query-parameter versioning (app.js?v=a1b2c3d4) is a valid fallback when the build system cannot rewrite HTML references or when you need rapid toggling without a full rebuild. The trade-offs between these two approaches are covered in detail on the query parameters vs filenames page.
Prerequisites
| Requirement | Why It Matters |
|---|---|
| Webpack 5, Vite 5, Rollup 4, esbuild 0.20+, or equivalent | contenthash / hash output template support |
| CDN with programmable cache key rules (Cloudflare, CloudFront, Fastly, Nginx) | Override default query-string handling |
| CI/CD pipeline writing an asset manifest | Maps logical names to hashed filenames for HTML injection |
| Deterministic build environment | Prevents phantom hash changes across identical source trees |
Cache Key Component Reference
A cache key is the composite string the CDN hashes to locate a stored object. Mis-scoped components are the most common source of cache fragmentation and incorrect invalidation.
| Component | Type | Default CDN Behaviour | Recommended Rule for Fingerprinted Assets |
|---|---|---|---|
Scheme (https) |
String | Included implicitly | Include |
| Host header | String | Normalised to lowercase | Include; strip redundant www if canonical |
| URI path | String | Included verbatim | Include verbatim; path carries the hash |
| Query string | String | Varies — many CDNs strip it for static MIME types | Exclude entirely; hash lives in path |
Accept-Encoding |
Header | Often excluded | Include to separate compressed from uncompressed variants |
Cookie / Authorization |
Headers | Sometimes included | Exclude; public assets must be stateless |
Vary: Accept-Encoding |
Response header | Triggers separate cache objects per encoding | Required; CDN uses it to split gzip / brotli / identity |
| Cache tags / Surrogate-Key | Response header | Metadata only | Add per-release tag for surgical purge |
Why Accept-Encoding Belongs in the Key
If a CDN omits Accept-Encoding from the key, a client requesting brotli may receive a previously cached gzip response — or vice versa. Always include it, or configure the CDN to handle content negotiation transparently (Cloudflare’s compression is automatic; CloudFront requires a separate Managed-CachePolicy-Amplify or a custom policy).
Why Vary Must Not Include User-Agent
Vary: User-Agent explodes the cache: every browser string creates a distinct entry for the same bytes. Use Vary: Accept-Encoding only for static assets. Dynamic HTML may also legitimately add Vary: Accept-Language or Vary: Cookie, but keep static assets free of session-scoped headers.
Step-by-Step Implementation
1. Configure the Build Tool
Embed an 8-character content hash in every output filename. Use 12–16 characters in monorepos with thousands of chunks to reduce collision probability further.
Vite 5:
// vite.config.js
import { defineConfig } from 'vite';
export default defineConfig({
build: {
rollupOptions: {
output: {
chunkFileNames: 'assets/[name]-[hash:8].js',
entryFileNames: 'assets/[name]-[hash:8].js',
assetFileNames: 'assets/[name]-[hash:8][extname]',
},
},
},
});
Webpack 5:
// webpack.config.js
module.exports = {
mode: 'production',
output: {
filename: 'assets/[name].[contenthash:8].js',
chunkFilename: 'assets/[name].[contenthash:8].chunk.js',
assetModuleFilename: 'assets/[name].[contenthash:8][ext][query]',
clean: true,
},
};
esbuild 0.20+:
// build.mjs
import * as esbuild from 'esbuild';
await esbuild.build({
entryPoints: ['src/main.ts'],
bundle: true,
minify: true,
entryNames: 'assets/[name]-[hash]',
assetNames: 'assets/[name]-[hash]',
outdir: 'dist',
});
2. Generate and Emit a Manifest
A manifest maps logical names (main.js) to hashed filenames (assets/main-a1b2c3d4.js). The HTML template references the manifest at deploy time rather than hardcoding hashed paths.
Vite emits .vite/manifest.json automatically when build.manifest: true. Webpack uses WebpackManifestPlugin:
// webpack.config.js
const { WebpackManifestPlugin } = require('webpack-manifest-plugin');
module.exports = {
plugins: [
new WebpackManifestPlugin({ fileName: 'asset-manifest.json' }),
],
};
3. Configure Edge Cache Key Rules
Apply cache key overrides per CDN. The pattern is identical: strip query strings, include Accept-Encoding, enforce long TTL.
Cloudflare Cache Rules (Terraform):
resource "cloudflare_ruleset" "static_assets" {
zone_id = var.zone_id
name = "Static asset cache key"
kind = "zone"
phase = "http_cache_settings"
rules {
action = "set_cache_settings"
action_parameters {
cache = true
edge_ttl {
mode = "override_origin"
default = 31536000
}
browser_ttl {
mode = "override_origin"
default = 31536000
}
cache_key {
ignore_query_strings_order = true
custom_key {
query_string { include = [] }
header { include = ["accept-encoding"] }
}
}
}
expression = "(http.request.uri.path matches \"^/assets/.*\\.[a-f0-9]{8,}\\.(js|css|png|jpg|svg|woff2)$\")"
description = "Immutable fingerprinted assets"
enabled = true
}
}
AWS CloudFront Cache Policy (JSON):
{
"CachePolicyConfig": {
"Name": "fingerprinted-assets",
"DefaultTTL": 31536000,
"MaxTTL": 31536000,
"MinTTL": 0,
"ParametersInCacheKeyAndForwardedToOrigin": {
"EnableAcceptEncodingGzip": true,
"EnableAcceptEncodingBrotli": true,
"HeadersConfig": { "HeaderBehavior": "none" },
"CookiesConfig": { "CookieBehavior": "none" },
"QueryStringsConfig": { "QueryStringBehavior": "none" }
}
}
}
Deploy with:
aws cloudfront create-cache-policy \
--cache-policy-config file://cloudfront-cache-policy.json
Nginx reverse proxy:
# nginx.conf (inside http {} block)
proxy_cache_path /var/cache/nginx levels=1:2
keys_zone=assets:64m max_size=10g inactive=365d use_temp_path=off;
server {
listen 443 ssl;
server_name example.com;
# Fingerprinted assets: strip query strings, cache for one year
location ~* ^/assets/[a-z0-9._-]+\.[a-f0-9]{8,}\.(js|css|png|jpg|svg|woff2)$ {
proxy_pass http://origin;
proxy_cache assets;
# Key contains only scheme + host + path — no query string
proxy_cache_key "$scheme$proxy_host$uri";
proxy_cache_valid 200 365d;
add_header Cache-Control "public, max-age=31536000, immutable";
add_header Vary "Accept-Encoding";
set $args "";
}
}
Fastly VCL:
sub vcl_hash {
# Build key from path only — exclude query string for fingerprinted paths
if (req.url.path ~ "^/assets/[a-z0-9._-]+\.[a-f0-9]{8,}\.(js|css|png|jpg|svg|woff2)$") {
set req.hash += req.url.path;
if (req.http.Accept-Encoding) {
set req.hash += req.http.Accept-Encoding;
}
return(hash);
}
}
Rollup 4:
// rollup.config.js
import { defineConfig } from 'rollup';
import { createHash } from 'crypto';
// Rollup does not provide a built-in [contenthash] template the same way Vite does,
// but you can implement it via an output plugin that renames files after bundling.
export default defineConfig({
input: 'src/main.js',
output: {
dir: 'dist/assets',
format: 'es',
chunkFileNames: '[name]-[hash].js',
entryFileNames: '[name]-[hash].js',
// Rollup 4 uses a deterministic internal hash for [hash] in the template
// when sourcemaps are disabled and module order is stable.
},
});
For monorepos or projects emitting more than a few hundred chunks, increase the hash length to 12 characters by post-processing the manifest:
# Verify no hash collisions in the output directory
find dist/assets -name "*.js" | sed 's/.*-\([a-f0-9]*\)\..*/\1/' | sort | uniq -d
# Silence = no collisions. Any output = increase hash length in the build config.
4. Set Response Headers at the Origin
The CDN respects your Cache-Control and Vary headers from the origin. Emit these for every fingerprinted asset:
Cache-Control: public, max-age=31536000, immutable
Vary: Accept-Encoding
Surrogate-Key: release-v2.5.0 static-assets
The Surrogate-Key (Fastly) or Cache-Tag (Cloudflare) header enables surgical purge by release tag without touching individual URLs.
5. Integrate Cache Key Rules into CI/CD
Cache key configuration must deploy before or alongside the assets themselves. A common failure mode: the new hashed assets are uploaded to the CDN origin before the Terraform or Cloudflare ruleset change is applied, so early edge requests are processed under the old key rules and cached incorrectly.
The correct order within a CI/CD pipeline is:
- Build assets and emit manifest.
- Upload assets to origin storage (S3, R2, or self-hosted origin).
- Apply CDN configuration changes (Terraform plan → apply, or Cloudflare API ruleset update).
- Deploy updated HTML entry points (index.html) to origin.
- Purge only the HTML cache entries.
# .github/workflows/deploy.yml (partial — shows ordering)
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Build
run: npm run build
- name: Upload assets to S3 origin
run: |
aws s3 sync dist/assets/ s3://$S3_BUCKET/assets/ \
--cache-control "public,max-age=31536000,immutable" \
--metadata-directive REPLACE
- name: Apply Cloudflare cache rules
run: terraform apply -auto-approve
working-directory: infra/cloudflare
- name: Deploy HTML entry points
run: |
aws s3 cp dist/index.html s3://$S3_BUCKET/index.html \
--cache-control "public,max-age=60,stale-while-revalidate=600"
- name: Purge HTML cache
run: |
curl -X POST "https://api.cloudflare.com/client/v4/zones/$CF_ZONE_ID/purge_cache" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"files": ["https://example.com/", "https://example.com/index.html"]}'
Invalidation Workflows
Fingerprinted assets self-invalidate: a content change produces a new URL, so old URLs become unreferenced rather than incorrect. The CDN does not need to be told the old asset is stale — it never serves it again because no HTML references it. This is the core operational advantage of path-based fingerprinting over query-parameter versioning.
The only objects that genuinely need purging are HTML entry points, because they carry references to hashed filenames and are typically cached with a much shorter TTL (60–300 seconds is common). When you deploy a new release, the new HTML references the new hashes, and the old HTML must be evicted from the CDN so that users fetch the updated document.
What Requires Purging and What Does Not
| Object Type | Cache Strategy | Invalidation Method |
|---|---|---|
Fingerprinted JS/CSS (app.a1b2c3d4.js) |
max-age=31536000, immutable |
None — URL change self-invalidates |
Fingerprinted images (logo.3f8a1c2d.png) |
max-age=31536000, immutable |
None |
HTML entry points (index.html, /) |
max-age=60, stale-while-revalidate=600 |
Purge by URL on each deploy |
Service worker (sw.js) |
no-cache or max-age=0 |
No CDN caching; origin fetched each time |
| API responses | Depends on endpoint | Surrogate-key purge or tag-based purge |
Surrogate-Key Purge Pattern
Tag every asset response with a release identifier. After a successful deploy, purge assets by tag rather than by URL. This is necessary when rolling back HTML to a previous release that references an older set of hashed filenames — you may want to ensure those old hashes are still warm in the CDN:
# Fastly: purge all assets tagged with a release
FASTLY_SERVICE_ID="your-service-id"
curl -X POST "https://api.fastly.com/service/${FASTLY_SERVICE_ID}/purge/release-v2.4.0" \
-H "Fastly-Key: $FASTLY_API_TOKEN"
# Cloudflare: purge by cache tag (requires Cache Rules with cache-tag header)
curl -X POST "https://api.cloudflare.com/client/v4/zones/$CF_ZONE_ID/purge_cache" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"tags": ["release-v2.4.0"]}'
Origin Shield and Multi-CDN Purge Ordering
When you run an origin shield (a secondary CDN tier in front of your origin), purge requests must propagate from the outer edge to the shield in the correct order. Purging the outer edge while the shield still holds the old response causes the outer edge to re-fetch the stale response from the shield and cache it again.
Correct purge order for shielded deployments:
- Deploy new assets to the origin.
- Purge the origin shield cache for the HTML entry points.
- Purge the outer CDN edge for the HTML entry points.
- Verify propagation:
curl -sIfrom multiple geographic locations and checkCF-Cache-StatusorX-Cache.
Cloudflare’s Tiered Cache Topology handles inner/outer invalidation automatically when you issue an API purge — the purge propagates to all tiers. CloudFront with Regional Edge Caches requires the invalidation to reach all regional caches before the outer edge re-populates; this typically takes 10–60 seconds.
Multi-Tier Cache Key Consistency
A common failure mode in large deployments is key inconsistency across cache tiers. Consider a setup where a corporate reverse proxy sits in front of Cloudflare, which sits in front of an S3 origin. Each tier may apply its own normalisation rules:
- The corporate proxy may strip
Accept-Encodingfrom the key, storing a single variant regardless of encoding. - Cloudflare normalises tracking parameters but may or may not include your custom
Accept-Encodingrule depending on the zone configuration. - S3 serves pre-compressed files at distinct paths (
file.js.gz) rather than negotiating compression via headers.
To detect inconsistency, bypass each tier independently with cache-busting headers and compare responses:
# Bypass Cloudflare edge cache, hit origin shield or origin directly
curl -sI -H "Cache-Control: no-cache" \
https://example.com/assets/main-a1b2c3d4.js \
| grep -i "cf-cache-status\|age\|content-encoding"
# Bypass all intermediate caches, go straight to S3 origin
# (requires a signed S3 URL or a direct-to-origin bypass header)
curl -sI "https://origin-bucket.s3.amazonaws.com/assets/main-a1b2c3d4.js" \
| grep -i "content-type\|content-encoding\|cache-control"
If the two responses carry different Content-Encoding values for the same URL, your cache key is not including Accept-Encoding at one of the tiers, causing encoding mismatch.
Normalisation Ordering in Proxy Chains
When requests traverse multiple reverse proxies, each proxy’s key normalisation applies to the request it receives — which has already been modified by the upstream proxy. If a load balancer strips Accept-Language before the request reaches Nginx, Nginx’s key normalisation rules for Accept-Language are irrelevant. Always trace the full request path and audit which headers survive to each proxy tier.
A reliable audit technique is to echo request headers from the origin during a diagnostic deploy:
# Temporary origin debug config — remove before production
location /debug-headers {
add_header X-Received-Accept-Encoding $http_accept_encoding;
add_header X-Received-Cookie $http_cookie;
return 200 "ok";
}
Then curl -H "Accept-Encoding: br" https://example.com/debug-headers through the full proxy chain and verify X-Received-Accept-Encoding: br appears in the response. If it does not, a proxy in the chain stripped it before it reached the origin.
Handling Vary Across CDN Generations
The Vary header instructs a shared cache to store separate responses per distinct header value. For fingerprinted static assets, Vary: Accept-Encoding is the only correct entry. Anything else (e.g., Vary: User-Agent, Vary: Cookie) fragments the cache geometrically.
CDN Support Matrix for Vary
| CDN | Vary: Accept-Encoding |
Vary: User-Agent |
Vary: Cookie |
|---|---|---|---|
| Cloudflare | Handled automatically; Vary on CDN layer is stripped and managed internally |
Ignored at edge; unique UA does not create unique cache entry | Respected — bypasses cache for private responses |
| CloudFront | Requires EnableAcceptEncodingGzip: true and EnableAcceptEncodingBrotli: true in cache policy |
Creates a separate entry per UA string — avoid | Requires cookie forwarding in cache policy |
| Fastly | Respected; Vary drives cache bucketing at the edge |
Creates separate entries per bucket of normalised UAs | Respected if cookies forwarded |
Nginx (proxy_cache) |
Requires explicit proxy_cache_key including $http_accept_encoding |
Can be included in proxy_cache_key but causes fragmentation |
Requires $cookie_... in key |
Practical rule: emit Vary: Accept-Encoding from the origin for all static assets and nothing else. Let the CDN handle compression negotiation. If a specific route must vary on another header, move it to a dynamic path outside the fingerprinted asset prefix.
Clearing a Corrupted Vary Cache Entry
If you previously emitted Vary: User-Agent for a static asset and then corrected it to Vary: Accept-Encoding, the CDN may hold hundreds of UA-bucketed entries for the same URL. Purge the URL explicitly after deploying the corrected origin headers:
# Cloudflare: purge the affected asset URL to clear all Vary-bucketed entries
curl -X POST "https://api.cloudflare.com/client/v4/zones/$CF_ZONE_ID/purge_cache" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"files": ["https://example.com/assets/main-a1b2c3d4.js"]}'
Verification Commands
After deploying, confirm the cache key is working correctly with targeted header inspections.
# 1. Verify the fingerprinted asset returns a HIT and the correct immutable directive
curl -sI https://example.com/assets/main-a1b2c3d4.js \
| grep -E "^(cache-control|cf-cache-status|x-cache|age|vary):" -i
# Expected output:
# cache-control: public, max-age=31536000, immutable
# cf-cache-status: HIT (or X-Cache: HIT on CloudFront/Nginx)
# age: 4821
# vary: Accept-Encoding
# 2. Confirm identical source produces identical hash across two consecutive builds
npm run build && sha256sum dist/assets/*.js | sort > /tmp/run1.txt
rm -rf dist
npm run build && sha256sum dist/assets/*.js | sort > /tmp/run2.txt
diff /tmp/run1.txt /tmp/run2.txt
# Silence = deterministic. Any diff = phantom hash change.
# 3. Verify compression variant differentiation
curl -sI -H "Accept-Encoding: br" https://example.com/assets/main-a1b2c3d4.js | grep -i content-encoding
curl -sI -H "Accept-Encoding: gzip" https://example.com/assets/main-a1b2c3d4.js | grep -i content-encoding
# Both should return their respective encoding, confirming separate cache entries
Edge Cases and Known Issues
Vary: Accept-Encoding Not Emitted by Origin
If the origin omits Vary: Accept-Encoding, Cloudflare and Nginx may cache only one compression variant. A brotli-capable client then receives the gzip byte stream directly. Fix: add Vary: Accept-Encoding in your origin server configuration and purge the cache after the change.
CloudFront Forwarding Headers You Did Not Intend
CloudFront’s “Legacy Default” cache policy forwards Host but not Accept-Encoding unless explicitly included. Always create a custom cache policy rather than using the managed defaults for fingerprinted assets.
Hash Change Without Source Change (Phantom Hash)
Non-deterministic bundler inputs — timestamps in source maps, randomised module IDs, filesystem order differences — produce different hashes for the same source. The result is a cache miss on every deployment even for unchanged files. Verify with the double-build diff above. See the deterministic build outputs guide for root-cause diagnosis.
Surrogate-Key / Cache-Tag Size Limits
Cloudflare’s cache-tag purge is limited to 16 KB per response header and 30 purge requests per second on the default plan. Fastly Surrogate-Key headers are limited to 16 KB. If you have thousands of assets per release, purge at the tag level rather than by individual URL, and batch API calls.
immutable Directive Ignored by Some Proxies
Intermediate proxies (corporate caches, ISP caches) may not honour immutable. This is acceptable: they will still respect max-age and the URL change guarantees freshness on the next request after the TTL expires. The browser always honours immutable in Firefox and Chrome.
Performance Impact
| Strategy | Cache Hit Ratio | Origin Requests per Deploy | Key Construction Overhead |
|---|---|---|---|
| Filename hash, no query string | ~99% after warm-up | Zero (old keys stay valid) | Negligible — direct URI lookup |
Query param (?v=hash), CDN normalised |
~50–80% | Full miss per new version | Low — regex param filter |
| Query param, CDN not normalised | ~20–60% | Constant miss per param variant | High — per-param fragmentation |
Timestamp (?t=epoch) |
~0% effective | 100% on every deploy | High — unbounded key space |
Fingerprinted filenames consistently produce the highest hit ratios because the URL uniquely identifies a content revision. No normalisation rule at the CDN is needed — the key is simply the path.
Frequently Asked Questions
Should cache keys include the full query string for fingerprinted assets?
No. Once the hash is in the filename, the query string adds no information and can only fragment the cache. Strip it at the edge for all paths matching the fingerprint pattern. If you must support query-param versioning alongside filename hashing, apply separate cache key rules per path prefix.
Does Accept-Encoding always need to be in the cache key?
Yes, unless the CDN performs transparent compression negotiation and stores a single canonical response that it re-encodes on the fly for each client (Cloudflare’s automatic platform optimisation does this). For self-hosted Nginx or CloudFront without on-the-fly re-encoding, omitting Accept-Encoding from the key risks serving mismatched compression. Include it.
What is the operational impact of non-deterministic cache keys?
Cache fragmentation raises origin egress cost, reduces edge hit ratios, and makes invalidation unpredictable. In extreme cases, non-deterministic keys prevent any two requests from sharing a cache object, effectively disabling the CDN layer. Fix the root cause in the build pipeline first; tuning edge rules cannot compensate for unbounded key variation.
How do I roll back to a previous asset version after a bad deploy?
Because old fingerprinted URLs remain valid on the CDN (they were never purged — they simply became unreferenced), a rollback is an HTML-level operation: re-point the HTML entry points to the previous hashed filenames. CDN objects for those old hashes are still live if within their TTL. The full procedure is covered in rolling back cache keys after a bad deploy.
Related
- Implementing cache keys: query parameters vs filenames — side-by-side comparison, normalization configs, decision matrix
- Rolling back cache keys after a bad deploy — re-pointing HTML to prior hashed URLs, CDN considerations, roll-forward vs roll-back decision
- Content hashing vs semantic versioning — choosing between content-driven and release-driven version identifiers
- Deterministic build outputs — eliminating phantom hash changes in CI
- Static asset fingerprinting fundamentals — parent overview