Article 6: Scaling Strategy & Proposed Solutions
The Evolution of Architecture
We have analyzed our “Basic Approach” and found it wanting. The single database limits our throughput, and synchronous writes kill our latency. The system works for a prototype, but it won’t survive a product launch.
Now, we face the classic engineering question: How do we scale?
There isn’t just one answer. We can scale for read speed (caching), write throughput (async processing), or data volume (sharding). Below, we propose three distinct architectural evolutions.
1. The Core Constraint: The Analytics vs. Speed Trade-off
Before picking a solution, we must address a unique constraint of URL Shorteners: The 301 vs. 302 Trap.
- Option A (301 Permanent Redirect): The browser caches the mapping. Fast for users, but we lose analytics because subsequent clicks never hit our server.
- Option B (302 Temporary Redirect): The browser hits our server every single time. We get perfect analytics, but our server load explodes (100% of traffic).
The Verdict: For a business like Bitly, Analytics is the product. We must use Option B (or 301 with short expiry). This means we cannot rely on browser caching. We must build a massive server-side caching layer.
Solution 1: “Separation of Concerns” (The Standard Pattern)
The Pragmatic Approach: Caching & Asynchronous Queues
This is the industry-standard pattern for read-heavy applications like ours. Most mid-sized systems (handling 1k - 10k RPS) stop here.
1. The Strategy
- Fix Reads (The Cache): Since we force traffic to our server (see 302 Trap above), we protect the database by placing Redis in front. 90% of requests never touch the disk.
- Fix Writes (The Queue): We stop blocking the user for non-critical tasks. We move analytics to a “Fire-and-Forget” background queue (Kafka/RabbitMQ).
2. Architecture Diagram
graph TD
User[User] -->|GET /abc| LB[Load Balancer]
LB --> API[API Server]
%% The Read Path
subgraph Read Path [Optimization: Caching]
API -->|1. Check| Redis[(Redis Cache)]
Redis -.->|Hit: Return| API
Redis -.->|Miss| API
end
%% The Write Path
subgraph Write Path [Optimization: Async]
API -->|2. Miss: Read DB| DB[(PostgreSQL)]
API -->|3. Async Event| Queue[Message Queue]
Queue --> Worker[Analytics Worker]
Worker -->|Batch Write| AnalyticsDB[(Analytics DB)]
end
style Redis fill:#6BCB77
style Queue fill:#FF9F1C
3. Why this is the Standard
- Latency: Reads hitting Redis take <1ms. Writes return immediately.
- Cost: Redis is cheaper than scaling a master database.
- Simplicity: It keeps the reliable PostgreSQL as the source of truth.
Solution 2: “The Hyperscale Pivot” (NoSQL)
The High-Scale Approach: DynamoDB / Cassandra
When you reach “Bitly scale” (billions of links), a single PostgreSQL master hits a hard limit on storage size and connection count.
1. The Strategy
- Key-Value Fit: URL Shortening is the perfect use case for NoSQL. The data model is a simple Key-Value pair (
short_code->long_url). We don’t need complex JOINs. - Key Generation Service (KGS): Since NoSQL lacks “Auto-Increment”, we often introduce a separate “Key Generation Service” that pre-generates unique 6-char strings and stores them in a database, ready to be grabbed.
2. Architecture Diagram
graph TD
User --> LB
LB --> API
API -->|Read/Write| NoSQL[(DynamoDB Global Table)]
subgraph "Write Optimization"
KGS[Key Gen Service] -->|Pre-generated Keys| API
end
3. The Trade-off
- Pros: Virtually infinite scale. AWS handles the operational burden.
- Cons: No interactions/joins. Analytics queries (“Count clicks for User X”) become very hard and usually require a separate data warehouse (like Snowflake/Redshift).
Solution 3: “The Sharded Monolith” (Legacy/Manual)
The Engineering Heavyweight: Manual Sharding
What if we need SQL features (ACID compliance, complex queries) but have too much data for one machine? This was the standard before NoSQL matured.
1. The Strategy
- Manual Sharding: We run 10 separate PostgreSQL instances.
- Routing Logic: The application code decides where data lives.
Shard_ID = Hash(short_code) % 10.
2. The Trade-off
- Pros: We keep SQL relationships.
- Cons: Operational Nightmare. Resharding data (moving from 10 to 20 DBs) without downtime is one of the hardest problems in distributed systems operations.
Recommendation: The Path Forward
| Feature | Solution 1 (Cache + Async) | Solution 2 (NoSQL) | Solution 3 (Sharding) |
|---|---|---|---|
| Throughput | High | Very High | High |
| Complexity | Low | Low (if Managed) | Very High |
| Analytics | Easy (SQL) | Hard (Need Warehouse) | Easy (SQL) |
Our Choice: Solution 1 (Cache + Async)
For 99% of interviews and real-world startups, Solution 1 is the correct answer.
- It solves the immediate bottlenecks (Latency).
- It allows us to keep using SQL for easy analytics queries.
- It introduces the critical concepts of Caching and Queues which are fundamental to all system design.
In the next articles, we will implement Solution 1:
- Part 7: Designing the Caching Layer (Redis) & Handling “The Thundering Herd”.
- Part 8: High-performance Analytics (Kafka/Queues). | Recommended? | ⚠️ Overkill | ✗ Overkill | ✅ BEST |
At 100 RPS: DynamoDB wins (lowest cost, least ops)
At 600 RPS (Year 2 Scale)
| Factor | Caching | Async | DynamoDB | Edge |
|---|---|---|---|---|
| Monthly Cost | $1,728 | $1,450 | $385 | $600 |
| Cost per Redirect | $0.017 | $0.014 | $0.004 | $0.006 |
| Latency (p99) | 12ms avg | 5ms avg | 100ms | 5ms global |
| RPS Capacity | 600 | 10K+ | 600 | Auto-scale |
| Complexity | Low | High | Low | Medium |
| Operational Burden | Medium | High | Low | Low (managed) |
| Good for? | ✓ Proven | ✓ Future-proof | ✅ BALANCED | ✅ Global |
At 600 RPS: Choose based on priorities:
- Cheapest: DynamoDB ($385/mo)
- Simplest Ops: DynamoDB
- Best Latency: Async or Edge
- Global Coverage: Edge Computing
At 5,800 RPS (Year 5 Scale)
| Factor | Caching | Async | DynamoDB | Edge |
|---|---|---|---|---|
| Monthly Cost | $8,640+ | $7,200 | $2,350 | $3,500 |
| Cost per Redirect | $0.009 | $0.007 | $0.002 | $0.004 |
| RPS Capacity | 600 (→ limit) | 50K+ | Auto-scale | Auto-scale |
| Complexity | High (multi-Redis) | High | Low | Medium |
| Operational Burden | High | High | Low | Low |
| Recommended? | ✗ Hitting limits | ✅ BEST | ✅ BEST | ✓ Good |
At 5,800+ RPS: Must choose Async or DynamoDB (caching hits scaling limits)
Cost-Benefit Analysis
Decision Framework:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
START: Where are you today?
│
├─ < 100 RPS?
│ └─ Choose: DynamoDB (cheapest) or Caching (proven)
│
├─ 100-600 RPS?
│ ├─ Is budget constrained?
│ │ └─ YES: DynamoDB
│ │ └─ NO: Caching-First (proven at this scale)
│ └─ Need global latency?
│ └─ YES: Edge Computing
│ └─ NO: DynamoDB or Caching
│
├─ 600-5K RPS?
│ └─ Choose: Async-Everything or DynamoDB
│ (Caching starts hitting limits)
│
└─ > 5K RPS?
└─ MUST use: Async or DynamoDB
Cost Progression (What you’ll pay as you grow):
1
2
3
4
5
MVP Phase (100 RPS): $226 → $295 (add DynamoDB)
Growth Phase (600 RPS): $295 → $385 (stay DynamoDB)
OR $295 → $1,728 (switch to Caching)
Scale Phase (5.8K RPS): $385 → $2,350 (DynamoDB auto-scales)
OR $1,728 → $7,200 (Async-Everything)
Cost-Saving Tips
- Start with DynamoDB on-demand (~$300/mo at 600 RPS)
- Switch to provisioned at 600+ RPS (save 30-40%)
- Use caching only if latency < 15ms required (DynamoDB → 100ms)
- Global expansion? Switch to Edge Computing ($600-700/mo)
- Never pay for unused capacity (use on-demand until you have predictable traffic)
Recommendation for Different Scenarios
Scenario A: Startup with Limited Engineering
Choose: Solution 3B (DynamoDB) first, then upgrade to Caching or Async
1
2
3
4
5
6
7
Why:
- Absolute cheapest at start: $295/month at 600 RPS
- Easy to implement (managed service)
- No operational burden (AWS handles scaling)
- Proven approach with low risk
- Perfect for MVP → Growth phase
- Upgrade path: DynamoDB → Add caching → Async later
Scenario B: Building for Scale from Day 1
Choose: Solution 2 (Async-Everything with Kafka)
1
2
3
4
5
6
7
Why:
- Future-proof for 50K+ RPS
- Handles bursty traffic efficiently
- Complete control over consistency
- Real-time analytics pipeline built-in
- Pay proportionally to scale ($1,450/mo at 600 RPS)
- Team ready to handle complexity
Scenario C: Global SaaS with Performance Needs
Choose: Solution 4 (Edge Computing with local caching)
1
2
3
4
5
6
Why:
- Lowest latency globally (<5ms)
- No need for geo-replication
- Automatic DDoS protection
- Cost-effective at scale ($600-700/mo)
- Handle viral content spikes
Scenario D: Enterprise with Compliance Requirements
Choose: Solution 3A (Sharded PostgreSQL)
1
2
3
4
5
6
Why:
- Full control over data
- Strong ACID consistency
- Audit trail capabilities
- On-premise/private cloud options
- Complex compliance requirements
What We’ll Deep Dive Into
Next 3 Articles (Deep Dives):
- Deep Dive 1: Caching-First (Most common)
- Redis architecture
- Cache invalidation strategies
- Handling hotspots
- Deep Dive 2: Async-Everything (Most scalable)
- Kafka architecture
- Event design
- Consumer patterns
- Deep Dive 3: DynamoDB (Best operational)
- Schema design
- Global secondary indexes
- Cost optimization
Summary
Three solutions with different focuses:
- Solution 1: Simple, proven, cost-effective (recommended for MVP scale)
- Solution 2: Extreme performance, complex operations
- Solution 3: Ultimate scalability, varies by choice (SQL vs NoSQL)
Decision factors:
- Your team’s expertise
- Cost constraints
- Scale projections
- Time to market
- Operational capabilities
Each solves the MVP limitations, but with different trade-offs.
Next: Deep dive into Caching-First approach (the recommended path forward).