Article 8: Deep Dive - The Caching Strategies
The Speed Layer
In our system design, we established that a URL shortener is extremely Read-Heavy (100:1 to 1000:1 ratio). This makes caching our most powerful tool. If we can serve a redirect from memory without hitting the database, we can handle aggressive traffic spikes with minimal cost.
This article details the “Caching-First” architecture.
1. The Multi-Layer Caching Strategy
We don’t just put a cache “in front” of the database. We implement caching at multiple network boundaries.
The Layers of Defense
- Level 1: The Browser (The Client)
- Mechanism: HTTP 301 Permanent Redirect.
- Latency: 0ms (Instant).
- Cost: $0.
- Trade-off: Once cached by the browser, you lose analytics visibility. The user doesn’t even talk to you.
- Level 2: The Edge (CDN)
- Mechanism: A Content Delivery Network (like CloudFront or Cloudflare) caches the 301 response close to the user’s city.
- Latency: 10-50ms.
- Hit Rate: High for global viral links.
- Level 3: The Server (Redis)
- Mechanism: A distributed Redis cluster in our data center.
- Latency: 1-5ms (internal network).
- Purpose: Protects the database from the “Thundering Herd” of requests that pass through the CDN (e.g., long-tail content).
Architecture Diagram
graph TD
User[User] -->|1. GET /abc| CDN
subgraph "External Internet"
CDN[CDN (Cloudflare)]
end
CDN -.->|Hit: Return 301| User
CDN -->|Miss| LB[Load Balancer]
subgraph "Our Data Center"
LB --> API[API Server]
API -->|2. Check| Redis[(Redis Cluster)]
Redis -.->|Hit: Return URL| API
Redis -.->|Miss| API
API -->|3. Fetch| DB[(PostgreSQL)]
end
style CDN fill:#4ECDC4
style Redis fill:#FFE66D
style DB fill:#FF6B6B
2. Dealing with the “Thundering Herd”
A classic distributed system problem is the Thundering Herd (or Cache Stampede).
The Scenario:
- A Justin Bieber tweet goes out with a new link
short.app/jb. - 10,000 users click it in the first second.
- The cache is empty (cold).
- All 10,000 requests miss the cache simultaneously.
- All 10,000 requests hit the Database at the exact same millisecond.
- Database Crashes.
The Solution: Request Coalescing (or “Single-Flight”) Instead of letting all requests hit the DB, the API server coordinates them.
- Request 1 comes in: “I need
jb”. API sees it’s missing. It flags “I am fetchingjb”. - Requests 2-10,000 come in: API checks the flag. “Oh,
jbis being fetched. I will wait right here.” - Request 1 returns from DB with the URL.
- API updates the cache and notifies the 9,999 waiting requests.
- They all return the data without touching the DB.
We reduced 10,000 queries to 1 query.
3. Cache Eviction Policies
We ran out of RAM! What do we delete? Since we can’t cache 10 billion links in RAM, we need a smart eviction policy.
Strategy: LRU (Least Recently Used)
This is the standard. If a link hasn’t been clicked in a while, it’s dropped. Use allkeys-lru in Redis configuration.
Strategy: The “20/80” Pre-loading
- The theory: 20% of links drive 80% of traffic.
- The implementations: We don’t just wait for clicks. We run a daily job that queries our Analytics: “What were the top 100k links yesterday?” -> Load them into Redis before the traffic wakes up. This is called Cache Warming.
4. Operational Details (Redis)
Memory Sizing Calculation
- Key:
shortURL:abc1234(15 bytes) - Value:
https://very-long-url.com...(100 bytes avg) - Total Entry: ~150 bytes (with overhead).
- Capacity: 16GB RAM can hold ~100 Million hot links.
- $(16 \times 10^9) / 150 \approx 106,666,666$ links.
- Verdict: A single mid-sized Redis instance can handle the working set of even a massive startup.
Resilience (Redis Sentinel)
Redis is fast, but if it dies, our database might die (see Thundering Herd). We use Redis Sentinel or Cluster Mode for high availability.
- Master: Handles writes (setting new cache).
- Slaves: Handle reads (lookups).
- Sentinel: Watches the Master. If Master dies, it promotes a Slave to Master automatically within seconds.
Summary
By implementing a robust 3-layer caching strategy and solving the “Thundering Herd” problem, we have effectively protected our database. Our system can now handle millions of reads per second.
But we still have one problem: Writes. Every time someone clicks, we need to count it. If we have 1M clicks/sec, we have 1M writes/sec. Our database can’t handle that. In the next article, we solve this with Asynchronous Processing.
- Typical Window: Few milliseconds (most traffic hits cache)
- How it works:
- API checks Redis for short code
- If found: return cached long URL (5ms)
- If not found: query PostgreSQL, update Redis, return
- After 1 hour: Redis key expires, next request queries DB (refreshes)
- Guarantee: Within 1 hour of last access, data is consistent
- Tradeoff: Between 1-hour refreshes, URL metadata might change (rare)
Layer 3: PostgreSQL (Source of Truth)
- Consistency Window: Immediate (strong consistency)
- How it works:
- All writes go directly to PostgreSQL (via transaction)
- Immediate durable writes
- Reads go through Redis/CDN first (async invalidation)
- Guarantee: Database is always correct; caches are eventually consistent
Scenarios: How Consistency Works in Practice
Scenario 1: User deletes a short link
1
2
3
4
5
6
7
8
9
Timeline:
T=0: User clicks "Delete" for link /abc123
T=1ms: Database UPDATE: is_deleted = true
T=2ms: API returns 404 for /abc123 requests
T=3ms: CloudFront cache invalidation initiated
T=100ms: CloudFront edge cache purged
T=1min: All CDN PoPs cleared (eventual)
Result: Within 100ms to 1 minute, all users see 404 (worst case)
Scenario 2: User updates destination URL
1
2
Cannot happen! Short codes are immutable after creation.
If user wants to change destination: must delete + recreate link.
Scenario 3: Database is down
1
2
3
4
5
6
7
8
Timeline:
T=0: PostgreSQL becomes unavailable
T=0-30s: Redis serves cached data (100% cache hit rate)
T=30s: CDN serves cached data (70% hit rate)
T=30-3600s: CDN continues serving, data is 1h old
Result: If DB down for < 1 hour, users see no impact (cached)
If DB down for > 1 hour, users see stale data
Real-World Impact
| Operation | Consistency Guarantee | User Impact |
|---|---|---|
| Create link | Immediate (DB written) | New link available after 1-2 seconds (CDN/Redis miss) |
| Redirect (exists) | 1 hour eventual (cached) | See correct URL from cache (< 30ms) |
| Update created link | N/A (immutable) | Not possible - must delete & recreate |
| Delete link | 100-3600s eventual | Stop working after cache expires (worst case 1h) |
| Database outage | Served from cache | Users see cached data for up to 1 hour |
Acceptable Risk Assessment
✅ Acceptable (rare in practice):
- User deletes link, but CDN serves 301 for up to 1 hour
- Database is down, users see cached data from up to 1 hour ago
❌ Not Acceptable (implement differently):
- User expects real-time consistency (< 100ms) across all data
- Need GDPR right-to-be-forgotten (immediate deletion everywhere)
Mitigation for strict requirements:
- Add cache invalidation API: immediately purge CDN + Redis
- Cost: $0.005 per invalidation (5 deletions = $0.025)
- For high-volume deletions (100+/day): use bulk invalidation ($20/month)
Cost Breakdown (Caching-First)
1
2
3
4
5
6
7
8
9
10
11
Component Cost/Month RPS Capacity
CloudFront (CDN) $28 Unlimited (edge)
Redis (3 nodes) $1,500 6,000 RPS
PostgreSQL $150 500 RPS
Load Balancer $50 Unlimited
TOTAL $1,728 600+ RPS ✓
Comparison:
MVP (no cache): Database limited to 100-200 RPS
With caching: 600+ RPS at same cost!
Summary: Caching-First Approach
Three layers:
- CDN: 70% hit, 10-50ms, $28/month
- Redis: 60% of remainder, 1-5ms, $1,500/month
- Database: Fallback + source of truth, 10-30ms, $150/month
Key improvements:
- Database load reduced 90% (2,900 RPS → 290 RPS)
- Latency improved (100ms → 15-20ms median)
- Cost-effective ($1,728/month for 600 RPS)
- Async analytics (no impact on redirect latency)
Operational:
- Monitor cache hit rate (60%+ expected)
- Detect and handle hotspots
- Warm cache on server restart
- Simple, proven approach
Next: Deep Dive 2 - Async-Everything (extreme performance).