MD5 vs SHA-256 for Assets

Selecting the correct cryptographic hash algorithm for static asset fingerprinting directly impacts build velocity, cache hit ratios, and supply chain security. While MD5 remains prevalent in legacy pipelines, SHA-256 is the modern standard for production-grade deployments. This guide evaluates computational trade-offs, collision probabilities, and implementation patterns for Webpack, Vite, and esbuild.

Algorithm selection dictates URL entropy, CDN routing efficiency, and invalidation workflows. For foundational architecture patterns, review Static Asset Fingerprinting Fundamentals before modifying pipeline configurations.

Hash Algorithm Performance & Collision Probability

Cryptographic overhead and mathematical collision space must be quantified before standardizing a hash function across your CI/CD environment.

Metric MD5 SHA-256
Output Length (Hex) 32 characters 64 characters
Collision Probability Non-trivial at >10⁶ assets Astronomically low (2¹²⁸ space)
Throughput (Modern CI) ~1.2 GB/s ~500–800 MB/s
SRI Compliance Deprecated Mandated
CPU Overhead Negligible <1% of total build time

MD5 collision probability becomes statistically relevant when managing monorepos or high-frequency deployments exceeding one million unique assets. SHA-256 provides cryptographically secure collision resistance with negligible impact on modern x86_64 or ARM64 runners.

Validate algorithm throughput locally before scaling:

# Benchmark MD5 vs SHA-256 on a 50MB test file
dd if=/dev/urandom of=test.bin bs=1M count=50 2>/dev/null
openssl speed md5 sha256

Algorithm selection must align with deployment scale and risk tolerance. When evaluating release strategies, contrast algorithmic fingerprinting against manual tagging in Content Hashing vs Semantic Versioning.

Build System Integration & Configuration

Modern bundlers require explicit hash algorithm declarations and truncation parameters to prevent URL bloat while maintaining collision resistance.

Webpack Configuration

Configure contenthash with explicit algorithm and length parameters. Truncate to 8–12 characters to balance safety and brevity.

// webpack.config.js
module.exports = {
 mode: 'production',
 output: {
 // SHA-256 truncated to 8 hex characters
 filename: '[name].[contenthash:sha256:8].js',
 chunkFilename: '[name].[contenthash:sha256:8].chunk.js',
 // Static assets (images, fonts, CSS)
 assetModuleFilename: 'assets/[name].[contenthash:sha256:8][ext]'
 },
 optimization: {
 moduleIds: 'deterministic',
 runtimeChunk: 'single'
 }
};

Vite Configuration

Vite leverages Rollup under the hood. Override the default hash generator using [hash] tokens.

// vite.config.js
import { defineConfig } from 'vite';

export default defineConfig({
 build: {
 rollupOptions: {
 output: {
 // SHA-256 truncated to 10 hex characters
 assetFileNames: 'assets/[name]-[hash:sha256:10][extname]',
 chunkFileNames: 'chunks/[name]-[hash:sha256:10].js',
 entryFileNames: 'entry-[name]-[hash:sha256:10].js'
 }
 }
 }
});

Step-by-Step Implementation Workflow

  1. Audit existing assets: find dist/ -type f | xargs -I {} sh -c 'echo {} $(md5sum {} | cut -d" " -f1)'
  2. Update bundler config: Replace legacy [hash] or [md5] tokens with [contenthash:sha256:N].
  3. Enforce deterministic builds: Strip timestamps, locale variables, and non-reproducible metadata.
  4. Verify output consistency: Run builds across two isolated CI runners and diff manifests.
  5. Deploy & monitor: Track CDN cache miss rates post-deployment to validate routing efficiency.

Hash drift across distributed environments indicates non-deterministic compilation steps. Resolve pipeline inconsistencies by following Deterministic Build Outputs.

CDN Cache Key Generation & Invalidation

CDN edge nodes parse the full URL to generate cache keys. Hash encoding format and length directly impact routing efficiency and storage allocation.

Encoding & URL Constraints

  • Hex Encoding: Standard for build tools. SHA-256 yields 64 characters; MD5 yields 32.
  • Base64 Encoding: Reduces length by ~25% but introduces +, /, and = characters that require URL-safe encoding (-, _).
  • Legacy Edge Limits: Older CDN configurations may truncate URLs at 256 characters or reject non-alphanumeric sequences. Always validate origin rules.

Cache Invalidation Workflow

Content hashing eliminates manual cache purging for updated assets. However, shared dependencies or misconfigured origin rules require targeted invalidation.

# Purge specific CDN paths (Cloudflare/CloudFront generic curl pattern)
curl -X POST "https://api.cdn-provider.com/v1/purge" \
 -H "Authorization: Bearer $CDN_API_TOKEN" \
 -H "Content-Type: application/json" \
 -d '{
 "files": [
 "/assets/main.a1b2c3d4.js",
 "/assets/vendor.e5f6g7h8.js"
 ]
 }'

Deployment Strategy Alignment

When hash length constraints or rollback requirements dictate routing behavior, evaluate How to choose between content hash and version hash.

Security Implications & Subresource Integrity (SRI)

Subresource Integrity mandates cryptographic verification of third-party and first-party assets. The W3C specification explicitly requires SHA-256, SHA-384, or SHA-512. MD5 is cryptographically broken and rejected by all modern browsers for integrity attributes.

SRI Generation Pipeline

Never reuse cache-busting fingerprints for integrity verification. Generate separate hashes to maintain security posture without compromising cache efficiency.

# Generate SRI hash for production deployment
openssl dgst -sha256 -binary dist/assets/main.js | openssl base64 -A
# Output: sha256-<base64_encoded_hash>

Inject the generated string into your HTML templates:

<script src="/assets/main.a1b2c3d4.js" 
 integrity="sha256-<base64_encoded_hash>" 
 crossorigin="anonymous"></script>

Automate SRI injection using build plugins (webpack-subresource-integrity, vite-plugin-sri) to prevent manual drift between served assets and DOM attributes.

Common Pitfalls & Resolutions

Issue Root Cause Resolution
Cache collisions from aggressive truncation Using [contenthash:6] with MD5 in large monorepos drastically increases collision probability. Switch to SHA-256 and maintain ≥8 character truncation. Use full 32-char MD5 only if legacy CDN constraints mandate it.
Inconsistent hashes across CI runners Non-deterministic steps (timestamps, env vars, locale differences) alter file contents before hashing. Enforce deterministic pipelines, strip metadata, and standardize Node.js/OS environments across all runners.
CDN rejecting long SHA-256 URLs Legacy edge nodes impose strict URL length limits or block non-alphanumeric cache keys. Configure CDN to accept 64-char hex strings, implement Base64URL encoding, or apply origin-level rewrite rules.

Frequently Asked Questions

Is MD5 still acceptable for CDN cache busting? Yes, strictly for cache busting. SHA-256 is strongly recommended due to negligible performance overhead and superior collision resistance at scale.

Does SHA-256 significantly slow down build times? No. Modern CPUs process SHA-256 at >500MB/s. The cryptographic overhead typically accounts for <1% of total compilation time.

Can I use different hash algorithms for CSS and JavaScript? Technically yes, but it complicates pipeline maintenance, SRI generation, and CDN cache key normalization. Standardizing on SHA-256 across all asset types is industry best practice.

How does hash length affect CDN cache hit ratios? Longer hashes reduce collision risk but increase URL length and cache key entropy. An 8–12 character truncation of SHA-256 optimally balances collision safety and routing brevity.