Fingerprinting in HTTP Headers

Static assets deployed with content-hashed filenames still fail to cache correctly when HTTP headers contradict the filename contract — understanding the full dual-layer strategy is what separates a cache that works from one that silently wastes origin bandwidth.

When to Use Header-Level Fingerprinting

Not every project needs every header technique described here. Use this decision matrix to pick the right combination for your deployment.

Scenario Recommended approach
SPA with Vite/webpack, assets in /dist Cache-Control: public, max-age=31536000, immutable on asset paths; no-cache on HTML
Multi-region CDN (Cloudflare, Fastly) Add Surrogate-Control or Surrogate-Key tags for tag-based purging
Files served from Nginx directly Disable inode ETags; inject content-hash ETags from build manifest
AWS CloudFront with S3 origin Set Cache-Control metadata on S3 objects at upload time; CloudFront inherits it
Monorepo with thousands of chunks Use 12–16 hex character hashes; default 8-char hashes risk collisions at scale
Legacy clients, no immutable support Keep max-age=31536000; add stale-while-revalidate=86400 as fallback
Assets served behind a cookie-based auth layer Strip Vary: Cookie at the CDN edge; keep assets on a separate cookieless domain

Prerequisites

Before applying the configurations in this guide, confirm the following:

  • Nginx 1.15.3+ — versions before 1.15.3 do not support add_header … always on non-2xx responses, which causes headers to vanish on 304 replies.
  • OpenSSL 1.1.1+ on the build host — needed if you are generating SHA-256 ETags with openssl dgst -sha256.
  • jq 1.6+ — used in the manifest-to-map conversion script.
  • Cloudflare Cache Rules — available on all paid Cloudflare plans (Pro and above for custom rules; Free plan supports Page Rules with more limited control).
  • AWS CloudFrontCache-Control headers set on S3 object metadata propagate automatically; no additional CloudFront distribution config is required for basic max-age / immutable behaviour.
  • Browser support for immutable: Chrome 99+, Firefox 49+, Safari 17.2+. Edge (Chromium) 99+. Legacy IE and older Safari fall back gracefully to the max-age TTL.

HTTP Header Configuration Reference

Header Type Default (no config) Effect on fingerprinted assets
Cache-Control: max-age Integer seconds Browser heuristic (~10% of Last-Modified age) Sets absolute TTL; use 31536000 (1 year) for immutable assets
Cache-Control: immutable Flag Absent Tells the browser not to revalidate during TTL; eliminates conditional requests
Cache-Control: no-cache Flag Absent Forces revalidation before each use; required for HTML entry points
Cache-Control: stale-while-revalidate Integer seconds Absent Serves stale copy while fetching fresh; useful as legacy fallback
ETag String Nginx generates inode/mtime weak ETag Validation token; must be content-hash-derived for deterministic caching
Last-Modified HTTP date File system mtime Less reliable than ETag for fingerprinted assets; mtime varies across nodes
Vary Header name list None Instructs CDN to key cache on listed request headers; must be kept minimal
Surrogate-Control Directive string None Varnish/Fastly override for CDN-side TTL, independent of browser Cache-Control
Surrogate-Key Space-separated tags None Fastly/Varnish tag-based purge; lets you purge all assets for a release atomically

Implementation: Step by Step

Step 1 — Decide on a hash length

Eight hex characters (32 bits) is safe for most projects. At 8 chars, the birthday-problem collision threshold is roughly 65,000 files, which covers almost all single-application builds. For monorepos or build pipelines emitting thousands of chunks, move to 12 or 16 characters. All examples below use 8 chars; a comment marks where to change the cut length.

Step 2 — Generate content-hash ETags at build time

Do not rely on Nginx’s default ETag generation — it uses inode number and mtime, both of which change between deployments even when file content is identical. Instead, generate hashes during the build and write them to a manifest.

#!/usr/bin/env bash
# scripts/generate-asset-manifest.sh
# Produces dist/asset-manifest.json mapping URL paths → content hashes
set -euo pipefail

DIST_DIR="./dist/assets"
MANIFEST="./dist/asset-manifest.json"

printf '{\n' > "$MANIFEST"
first=true

for file in "$DIST_DIR"/*; do
  [[ -f "$file" ]] || continue
  filename=$(basename "$file")
  # Change cut -c1-8 to cut -c1-12 or cut -c1-16 for monorepos
  hash=$(openssl dgst -sha256 -hex "$file" | awk '{print $2}' | cut -c1-8)
  if [ "$first" = true ]; then
    first=false
  else
    printf ',\n' >> "$MANIFEST"
  fi
  printf '  "/assets/%s": "%s"' "$filename" "$hash" >> "$MANIFEST"
done

printf '\n}\n' >> "$MANIFEST"
echo "Manifest written to $MANIFEST"

The output looks like:

{
  "/assets/app.a1b2c3d4.js": "a1b2c3d4",
  "/assets/vendor.e5f6a7b8.js": "e5f6a7b8",
  "/assets/main.9c0d1e2f.css": "9c0d1e2f"
}

Step 3 — Configure Nginx with content-hash ETags and immutable caching

Convert the manifest to an Nginx map block at deploy time, then include it in the server configuration. This avoids a server restart — only a reload is needed.

# scripts/build-nginx-etag-map.sh
# Run after generate-asset-manifest.sh
jq -r 'to_entries[] | "  \"\(.key)\" \"\(.value)\";' \
  dist/asset-manifest.json \
  > /etc/nginx/conf.d/etag-entries.conf

nginx -t && nginx -s reload

Full Nginx server block:

# /etc/nginx/sites-available/myapp.conf

# Map URI → content-hash ETag value (populated at deploy time)
map $uri $asset_etag {
    default "";
    include /etc/nginx/conf.d/etag-entries.conf;
}

server {
    listen 443 ssl http2;
    server_name example.com;

    root /var/www/myapp/dist;
    index index.html;

    # ── Fingerprinted static assets ─────────────────────────────────────────
    location ~* ^/assets/.*\.(js|css|woff2?|png|jpg|webp|svg|ico)$ {
        # Disable Nginx's inode/mtime-based ETag — we set our own below
        etag off;

        # Inject the content-hash ETag from the deploy-time map
        # If $asset_etag is empty (unknown file), no ETag header is emitted
        if ($asset_etag) {
            add_header ETag "\"$asset_etag\"" always;
        }

        # One-year TTL + immutable: browsers skip revalidation entirely
        add_header Cache-Control "public, max-age=31536000, immutable" always;

        # Accept-Encoding only — prevent cache fragmentation
        add_header Vary "Accept-Encoding" always;

        # Never let a stray Set-Cookie leak onto static responses
        proxy_hide_header Set-Cookie;

        expires 365d;
    }

    # ── HTML entry points ────────────────────────────────────────────────────
    location ~* \.html$ {
        # Force revalidation on every request — HTML references hashed URLs
        # so the browser must always fetch the freshest entry point
        add_header Cache-Control "no-cache, no-store, must-revalidate" always;
        add_header Pragma "no-cache" always;
        expires 0;
    }

    # ── Everything else ──────────────────────────────────────────────────────
    location / {
        try_files $uri $uri/ /index.html;
        add_header Cache-Control "public, max-age=3600" always;
    }
}

Step 4 — Configure Cloudflare Cache Rules

Cloudflare Cache Rules (dashboard → Caching → Cache Rules) let you override Cache-Control at the edge without touching your origin config. Create two rules in order:

Rule 1 — Fingerprinted assets (highest priority)

IF  URI Path matches regex  ^/assets/.*\.(js|css|woff2?|png|webp|svg|ico)$
THEN
  Cache eligibility: Eligible for cache
  Edge TTL: Override — 1 year (31536000 seconds)
  Browser TTL: Override — 1 year
  Respect origin Cache-Control: disabled (use rule values)
  Set response header: Cache-Control = public, max-age=31536000, immutable

Rule 2 — HTML entry points

IF  URI Path matches regex  \.html$  OR  URI Path equals  /
THEN
  Cache eligibility: Bypass cache
  Set response header: Cache-Control = no-cache, no-store, must-revalidate

Cloudflare strips weak ETag headers generated by inode/mtime by default when it compresses a response. To preserve your content-hash ETags, either:

  • Enable “Respect Strong ETags” in Cloudflare’s Speed → Optimization settings (Cloudflare then uses a variant ETag rather than stripping it), or
  • Disable Cloudflare’s automatic compression for the asset path and handle Content-Encoding at the origin.

Step 5 — AWS CloudFront note

CloudFront forwards Cache-Control headers from S3 object metadata to the browser unchanged and uses them to set the edge TTL by default (when the “Cache based on selected request headers” behaviour is set to “None (Improves Caching)”). Upload fingerprinted assets with --cache-control "public, max-age=31536000, immutable" in the S3 metadata at CI time:

aws s3 cp dist/assets/ s3://my-bucket/assets/ \
  --recursive \
  --cache-control "public, max-age=31536000, immutable" \
  --metadata-directive REPLACE

CloudFront does not inject immutable automatically — you must set it on the S3 object or via a CloudFront Function / Lambda@Edge response handler.

Step 6 — Add Surrogate-Key headers for tag-based purging (Fastly / Varnish)

When you use Fastly or Varnish in front of an origin, Surrogate-Key (Fastly) or Xkey (Varnish Plus) lets you purge entire sets of assets atomically by release tag, without knowing every individual URL. This is the server-side complement to filename hashing — see cache key architecture for the broader strategy.

Add a release tag header at your origin:

# In the Nginx assets location block, add:
add_header Surrogate-Key "assets release-2024-10-01" always;
add_header Surrogate-Control "max-age=31536000" always;

Surrogate-Control sets the CDN-side TTL without affecting the Cache-Control header the browser sees. After a deploy, purge all assets for the old release in one API call:

# Fastly instant purge by Surrogate-Key tag
curl -X POST "https://api.fastly.com/service/${FASTLY_SERVICE_ID}/purge/release-2024-10-01" \
  -H "Fastly-Key: ${FASTLY_API_KEY}"

Dual-Layer Caching Strategy Diagram

Dual-Layer Caching: Filename Hash + HTTP Headers Browser Layer 1: URL hash app.a1b2c3d4.js Layer 2: Headers Cache-Control: immutable New URL = cache miss → fetch from CDN Same URL + immutable → serve from disk CDN Edge Node Cache key = URL path /assets/app.a1b2c3d4.js Vary: Accept-Encoding (strip Cookie, User-Agent) HIT → serve cached Edge TTL: 1 year MISS → origin fetch ETag validated once Origin (Nginx) ETag: "a1b2c3d4" (content-hash, strong) Cache-Control: public, max-age=31536000, immutable Surrogate-Key: assets release-2024-10 Last-Modified: present (secondary fallback only) request HIT MISS 200 + headers Cache HIT (no origin load) Cache MISS (origin fetch + ETag check) Filename hash (Layer 1) HTTP headers (Layer 2)
Dual-layer caching strategy: the filename hash creates a unique cache key at every layer; HTTP headers dictate how long each layer holds that key and whether revalidation is required.

Verification Commands

Run these after each deploy to confirm headers are set correctly at both origin and CDN.

# 1. Check Cache-Control and ETag on a fingerprinted asset at the CDN edge
curl -sI "https://example.com/assets/app.a1b2c3d4.js" | grep -Ei "cache-control|etag|vary|x-cache"

# Expected output:
# cache-control: public, max-age=31536000, immutable
# etag: "a1b2c3d4"
# vary: Accept-Encoding
# x-cache: HIT          ← Cloudflare; "HIT" confirms edge served it

# 2. Confirm immutable prevents a conditional request on repeat fetch
curl -sI "https://example.com/assets/app.a1b2c3d4.js" \
  -H 'If-None-Match: "a1b2c3d4"' | head -1
# Should return "HTTP/2 200" from CDN cache (not 304) because immutable
# assets are served directly from edge, bypassing conditional logic

# 3. Confirm HTML entry point is not cached
curl -sI "https://example.com/" | grep -i cache-control
# Expected: cache-control: no-cache, no-store, must-revalidate

# 4. Verify ETag matches build manifest hash
EXPECTED=$(jq -r '"/assets/app.a1b2c3d4.js"' dist/asset-manifest.json)
ACTUAL=$(curl -sI "https://example.com/assets/app.a1b2c3d4.js" | grep -i etag | awk '{print $2}' | tr -d '"')
echo "Manifest: $EXPECTED  CDN: $ACTUAL"
[ "$EXPECTED" = "$ACTUAL" ] && echo "MATCH" || echo "MISMATCH — check deploy"

# 5. Check Vary header does not include Cookie or User-Agent
curl -sI "https://example.com/assets/app.a1b2c3d4.js" | grep -i "^vary:"
# Must NOT show: vary: Cookie, vary: User-Agent

# 6. For Fastly: verify Surrogate-Key tag is present on origin responses
curl -sI "https://origin.example.com/assets/app.a1b2c3d4.js" | grep -i surrogate
# Expected: surrogate-key: assets release-2024-10-01
#           surrogate-control: max-age=31536000

Edge Cases and Known Issues

ETag stripping by CDN compression

When a CDN compresses a response (gzip/brotli), it changes the byte content and therefore invalidates the strong ETag. Cloudflare converts a strong ETag to a variant ETag (appending -gzip or -br) rather than stripping it. Fastly strips ETags on compression by default. Mitigation options:

  • Pre-compress assets at build time (.js.gz, .js.br) and serve the pre-compressed file directly, disabling on-the-fly compression for those paths. The ETag then matches the pre-compressed bytes and remains stable.
  • On Cloudflare: enable “Respect Strong ETags” in Speed → Optimization. Cloudflare then transforms the ETag to match the compressed variant rather than generating a new weak one.
  • On Nginx: use gzip_static on and brotli_static on so Nginx serves pre-compressed files directly, keeping the ETag you injected intact.

immutable ignored by older clients

Safari added immutable support in version 17.2 (late 2023). Older Safari and all IE clients treat the directive as unknown and fall back to the max-age TTL only — they will still revalidate at the end of max-age, but the behaviour is correct, just slightly less optimal. No special fallback is needed beyond including a long max-age.

Last-Modified unreliability across nodes

Last-Modified reflects the filesystem mtime of the served file. In a multi-node deployment, even when file content is identical, mtime will differ across nodes if they received files at slightly different timestamps. A client switching nodes between requests may receive a 200 instead of a 304, wasting bandwidth. For cache key architecture decisions, treat Last-Modified as a secondary fallback only — ETag is the authoritative validator.

If your application sets a session cookie on responses — even on static asset responses — and your CDN is configured to Vary: Cookie, the CDN creates a separate cache entry per distinct Cookie header value. A site with 10,000 users will generate 10,000 cache entries for the same file. Fix: serve static assets from a cookieless subdomain (assets.example.com) and ensure no Set-Cookie header appears on asset responses.

AWS CloudFront and the immutable directive

CloudFront does not natively interpret immutable to modify its own TTL behaviour — it uses Cache-Control: max-age for the edge TTL by default. The immutable flag passes through to the browser unchanged, where it does take effect. If you need CloudFront to honour a longer or independent edge TTL, set a custom “Default TTL” and “Maximum TTL” in the CloudFront cache behaviour. S3 object metadata Cache-Control takes precedence over the CloudFront default TTL when present.

Cloudflare Cache-Control overrides

Cloudflare’s default “Browser Cache TTL” setting (under Caching → Configuration) can override origin Cache-Control headers for the browser. Set it to “Respect Existing Headers” to ensure your origin-set max-age=31536000, immutable reaches the browser unchanged.

Hash length and collision risk

The default 8 hex characters (32 bits of entropy) gives a collision probability of ~1% at around 9,300 files. Most single-app projects stay well below this. However, monorepos with multiple apps sharing a single CDN bucket, or build pipelines that emit thousands of code-split chunks, should move to 12 chars (48 bits, collision threshold ~370,000 files) or 16 chars (64 bits, effectively collision-proof). See hash algorithm choice for detailed collision probability tables.

Performance Impact

The dual-layer strategy eliminates the most expensive category of browser-to-origin requests: conditional revalidation on unchanged assets.

Without immutable: On every page load after the max-age window closes, the browser sends an If-None-Match request for each cached asset. A page with 20 assets generates 20 conditional requests. Even if all return 304 Not Modified, each round-trip adds latency — typically 20–100 ms per hop depending on geography.

With immutable: The browser skips conditional requests entirely for the max-age duration. For a one-year TTL, the only origin (or CDN) fetch per asset happens once per browser. Subsequent page loads serve everything from the browser’s local disk cache with zero network activity for those assets.

CDN hit ratio effect: Because the cache key is the full fingerprinted URL path, and that URL never changes for a given file content, CDN hit ratios for fingerprinted assets routinely exceed 99% once the cache warms. Non-fingerprinted assets keyed on plain paths with Vary: Cookie or Vary: User-Agent often see hit ratios below 50% on high-traffic sites.

Surrogate-Key purge speed: Tag-based purging on Fastly propagates globally in under 150 ms. Compared to path-based purge loops (one API call per file), tag purging is O(1) from the operator’s perspective regardless of how many assets share the tag. For immutable TTL tuning details including staggered deployment patterns, see the linked guide.

Bandwidth savings from pre-compression + stable ETags: Serving brotli-pre-compressed assets (.js.br) avoids on-the-fly compression CPU cost at the origin and CDN, while keeping ETags stable across compressions.

FAQ

Why do I need both a content hash in the filename and headers like Cache-Control: immutable — isn’t one of them enough?

The filename hash ensures that every unique content version has a unique URL, so browser and CDN caches never serve stale content at a known URL. But without Cache-Control: immutable, browsers still send a conditional request (If-None-Match) once the max-age expires, even for a URL that can never change. The immutable directive tells the browser: “for the entire max-age window, do not even attempt to revalidate.” The two layers solve different problems — the filename hash handles cache busting; the header handles revalidation frequency. Removing either layer degrades caching efficiency. For a detailed comparison of the ETag vs immutable strategies, see the linked page.

Should I disable ETags entirely for fingerprinted assets, or keep them?

Keep strong, content-hash-derived ETags even when using immutable. The immutable directive affects browser-to-CDN revalidation; ETags are used when the CDN itself revalidates against the origin (on a CDN cache miss or CDN TTL expiry). Without an ETag, the CDN must issue a full 200 response rather than a 304, wasting bandwidth on the CDN-to-origin leg. Disable only the server’s automatic inode/mtime ETag generation (etag off in Nginx), and replace it with the build-manifest-derived value.

What Cache-Control header should I set on the HTML entry point (index.html)?

Use Cache-Control: no-cache, no-store, must-revalidate on index.html and any other HTML entry points. The HTML file is the only document that references all the hashed asset URLs. If the browser caches a stale HTML file, it requests old asset URLs — which may still be cached at the CDN but represent the previous deployment. Keeping HTML uncached guarantees the browser always fetches the latest manifest of hashed references, while every referenced asset itself is cached aggressively. The extra latency of fetching HTML on every navigation is negligible compared to the asset payload it references.

How does Surrogate-Key purging interact with fingerprinted filenames during a rolling deploy?

During a rolling deploy, old asset filenames (old hashes) and new asset filenames (new hashes) coexist on the CDN simultaneously. You should not purge old-hash URLs during the deploy — in-flight HTML responses referencing old hashes must still resolve. Instead, tag new assets with the new release tag and old assets with the old release tag. Only purge the old release tag after HTML delivery has fully transitioned to the new deployment (typically after your old instance terminates). Alternatively, let old-hash assets expire naturally — their max-age=31536000 means they occupy CDN cache space but are never requested again once all HTML is updated.