The Art of Sharding — Part IV: The Final Test (System Design)
Table of Contents
12. System Design: Which Sharding Concepts to Use
URL Shortener
graph TB
subgraph "URL Shortener Architecture"
U[User Request:<br/>bit.ly/abc123]
R[Router]
S1[(Shard 1<br/>short_url: a-h)]
S2[(Shard 2<br/>short_url: i-p)]
S3[(Shard 3<br/>short_url: q-z)]
S4[(Shard 4<br/>short_url: 0-9)]
end
U --> R
R -->|abc123 → Shard 1| S1
style S1 fill:#9f9,stroke:#333,stroke-width:3px
Recommended Sharding Concepts:
- Shard Key:
short_url (high cardinality, even distribution)
- Strategy: Range-based or Hash-based
- Range: First character of short URL (a-z, 0-9)
- Hash: Hash of short URL
- Why: Read-heavy, single-key lookups
- Avoid: Cross-shard queries not needed
1
2
3
| -- Shard assignment example
SELECT * FROM urls WHERE short_url = 'abc123'
-- Routes to: hash('abc123') % 4 = Shard 1
|
graph TB
subgraph "Social Media Architecture"
U1[User 12345<br/>Timeline Request]
S1[(Shard 1<br/>Users 0-24999<br/>user_id based)]
S2[(Shard 2<br/>Users 25000-49999)]
S3[(Shard 3<br/>Users 50000-74999)]
F1[(Feed Shard 1<br/>Pre-computed feeds<br/>user_id based)]
end
U1 -->|user_id=12345| S1
U1 -->|Get timeline| F1
style S1 fill:#9f9,stroke:#333,stroke-width:3px
style F1 fill:#9cf,stroke:#333,stroke-width:3px
Recommended Sharding Concepts:
- Shard Key:
user_id for user data, posts, follows
- Strategy: Hash-based sharding
- Data Locality: User + their posts + followers on same shard
- Timeline: Separate shards with pre-computed feeds
- Denormalization: Duplicate user info in posts to avoid joins
1
2
3
4
5
6
7
8
| # Shard assignment
def get_user_shard(user_id):
return hash(user_id) % NUM_SHARDS
# Co-locate related data
# - User profile: shard_id = get_user_shard(user_id)
# - User's posts: shard_id = get_user_shard(author_id)
# - User's followers: shard_id = get_user_shard(followee_id)
|
Challenge: Celebrity problem (hot partitions)
- Solution: Separate handling for celebrity accounts
- Use composite key:
(user_id, timestamp) for posts
graph TB
subgraph "E-commerce Sharding"
C1[Customer]
US[(User Shard<br/>by user_id)]
OS[(Order Shard<br/>by user_id)]
PS[(Product Catalog<br/>by product_id)]
IS[(Inventory<br/>by warehouse_id)]
end
C1 -->|User profile| US
C1 -->|Order history| OS
C1 -->|Browse products| PS
C1 -->|Check stock| IS
style US fill:#9cf,stroke:#333
style OS fill:#9cf,stroke:#333
style PS fill:#fc9,stroke:#333
style IS fill:#f9c,stroke:#333
Recommended Sharding Concepts:
- User Data Shard Key:
user_id
- Strategy: Hash-based
- Contains: user profile, addresses, payment methods
- Order Data Shard Key:
user_id (not order_id!)
- Strategy: Hash-based on user_id
- Co-location: User’s orders with user data
- Enables: Fast “my orders” queries
- Product Catalog Shard Key:
product_id
- Strategy: Hash-based
- Separate from user data (different access patterns)
- Inventory Shard Key:
warehouse_id or region
- Strategy: Geographic sharding
- Close to fulfillment centers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| # Co-location strategy
class EcommercSharding:
def get_user_shard(self, user_id):
return hash(user_id) % USER_SHARDS
def get_order_shard(self, order):
# Shard by user_id, not order_id!
# Allows fast "get my orders" without scatter-gather
return hash(order.user_id) % USER_SHARDS
def get_product_shard(self, product_id):
return hash(product_id) % PRODUCT_SHARDS
def get_inventory_shard(self, warehouse_id):
# Geographic sharding
return WAREHOUSE_TO_SHARD[warehouse_id]
# Example queries
# Fast: Get user's orders (single shard)
SELECT * FROM orders WHERE user_id = 12345
# -> Routes to user's shard, co-located!
# Slow: Get all orders for product (scatter-gather)
SELECT * FROM orders WHERE product_id = 789
# -> Must query all user shards
|
Denormalization: Store product details in orders
1
2
3
4
5
6
7
8
9
10
| -- Denormalized order table
CREATE TABLE orders (
order_id BIGINT,
user_id BIGINT, -- Shard key
product_id BIGINT,
-- Denormalized product info (avoid join)
product_name VARCHAR(255),
product_price DECIMAL,
product_image_url VARCHAR(500)
);
|
Chat Application (WhatsApp/Slack)
graph TB
subgraph "Chat Sharding Architecture"
C1[User sends message<br/>in Group 12345]
CS[(Conversation Shard<br/>by conversation_id)]
MS[(Message Shard<br/>by conversation_id)]
US[(User Shard<br/>by user_id)]
end
C1 -->|Group 12345| CS
C1 -->|Messages for<br/>Group 12345| MS
style CS fill:#9f9,stroke:#333,stroke-width:3px
style MS fill:#9f9,stroke:#333,stroke-width:3px
Recommended Sharding Concepts:
- Shard Key:
conversation_id (group_id or channel_id)
- Strategy: Hash-based sharding
- Data Locality: All messages for a conversation on same shard
- Why: Users fetch messages per conversation
1
2
3
4
5
6
7
8
9
10
11
12
13
| # Shard by conversation
def get_conversation_shard(conversation_id):
return hash(conversation_id) % NUM_SHARDS
# Co-locate messages with conversation
# Shard 1: Conversation A + all messages in A
# Shard 2: Conversation B + all messages in B
# Fast query (single shard)
SELECT * FROM messages
WHERE conversation_id = 'conv_12345'
ORDER BY timestamp DESC
LIMIT 50
|
Alternative for 1-on-1 chat: Composite key
1
2
3
4
5
6
| # For 1-on-1 chat between user_a and user_b
def get_dm_shard(user_a_id, user_b_id):
# Ensure consistent ordering
min_id, max_id = sorted([user_a_id, user_b_id])
conversation_key = f"{min_id}_{max_id}"
return hash(conversation_key) % NUM_SHARDS
|
Read Receipts: Separate sharding
1
2
3
4
| # Read receipts sharded by user_id
# Allows fast "get unread count" for a user
def get_readreceipt_shard(user_id):
return hash(user_id) % NUM_SHARDS
|
graph TB
subgraph "Analytics Sharding (Time-Series)"
Q[Query: Sales report<br/>Jan 2024 - Mar 2024]
S1[(Shard: Q4 2023)]
S2[(Shard: Q1 2024)]
S3[(Shard: Q2 2024)]
S4[(Shard: Q3 2024)]
end
Q -.->|Skip| S1
Q -->|Query| S2
Q -.->|Skip| S3
Q -.->|Skip| S4
style S2 fill:#9f9,stroke:#333,stroke-width:3px
Recommended Sharding Concepts:
- Shard Key:
timestamp or date
- Strategy: Range-based sharding (by month/quarter/year)
- Why: Most queries are time-range based
- Partition Pruning: Skip irrelevant time-range shards
1
2
3
4
5
6
7
8
9
10
11
| -- Range-based sharding for analytics
Shard 1: 2024-01-01 to 2024-03-31 (Q1 2024)
Shard 2: 2024-04-01 to 2024-06-30 (Q2 2024)
Shard 3: 2024-07-01 to 2024-09-30 (Q3 2024)
-- Query: Only hits Shard 1
SELECT SUM(revenue) FROM sales
WHERE date BETWEEN '2024-01-01' AND '2024-03-31'
-- Old data archival
-- Shard 1 (2023 data) → Move to cold storage
|
Composite Shard Key for Dimensional Analysis:
1
2
3
4
5
6
7
8
| # Shard by date + region for geo-analytics
def get_analytics_shard(date, region):
month = date.strftime('%Y-%m')
return f"shard_{region}_{month}"
# Enables efficient queries like:
# "Sales in US region for Jan 2024"
# -> Routes to: shard_US_2024-01
|
Ride-Sharing (Uber/Lyft)
graph TB
subgraph "Geo-Sharding for Ride-Sharing"
U1[User in<br/>San Francisco]
U2[User in<br/>New York]
S1[(Shard: SF Region<br/>Active rides<br/>Driver locations)]
S2[(Shard: NYC Region<br/>Active rides<br/>Driver locations)]
end
U1 -->|Request ride| S1
U2 -->|Request ride| S2
S1 -->|Find nearby drivers<br/>Low latency| S1
S2 -->|Find nearby drivers<br/>Low latency| S2
style S1 fill:#9f9,stroke:#333,stroke-width:3px
style S2 fill:#9f9,stroke:#333,stroke-width:3px
Recommended Sharding Concepts:
- Active Rides Shard Key:
region or geohash
- Strategy: Geographic sharding
- Data Locality: Drivers and riders in same region on same shard
- Why: Driver-rider matching is local
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| import geohash2
class RideSharing Sharding:
def get_geohash_shard(self, latitude, longitude):
# Geohash precision: 5 chars = ~5km x 5km area
geo = geohash2.encode(latitude, longitude, precision=5)
return hash(geo) % NUM_SHARDS
def find_nearby_drivers(self, rider_lat, rider_lon):
shard_id = self.get_geohash_shard(rider_lat, rider_lon)
# Query single shard for nearby drivers
return query_shard(shard_id,
f"SELECT * FROM drivers WHERE active=true")
# Example:
# Rider in SF: lat=37.7749, lon=-122.4194
# Geohash: 9q8yy → Shard 3
# All SF drivers also on Shard 3 → Fast matching!
|
Historical Ride Data: Different sharding strategy
1
2
3
4
5
6
| # Historical rides: shard by user_id or ride_date
def get_history_shard(user_id):
return hash(user_id) % HISTORY_SHARDS
# Allows fast "my ride history" queries
SELECT * FROM ride_history WHERE user_id = 12345
|
Gaming Leaderboard
graph TB
subgraph "Leaderboard Sharding"
Q[Query: Top 100 players<br/>globally]
S1[(Shard 1<br/>Region: NA)]
S2[(Shard 2<br/>Region: EU)]
S3[(Shard 3<br/>Region: APAC)]
A[Aggregator]
end
Q --> A
A -->|Scatter| S1
A -->|Scatter| S2
A -->|Scatter| S3
S1 -->|Top 100 NA| A
S2 -->|Top 100 EU| A
S3 -->|Top 100 APAC| A
A -->|Merge & Sort<br/>Return Top 100| Q
style A fill:#f96,stroke:#333,stroke-width:3px
Recommended Sharding Concepts:
- Shard Key:
region or game_id
- Strategy: Regional sharding + denormalization
- Global Leaderboard: Scatter-gather across regions
- Local Leaderboard: Single shard query
1
2
3
4
5
6
7
8
9
10
11
12
13
| # Regional sharding
def get_leaderboard_shard(region):
return REGION_TO_SHARD[region]
# Fast: Regional leaderboard (single shard)
SELECT player_id, score FROM leaderboard
WHERE region = 'NA'
ORDER BY score DESC
LIMIT 100
# Slow: Get all orders for product (scatter-gather)
SELECT * FROM orders WHERE product_id = 789
# -> Must query all user shards
|
Optimization: Pre-computed Global Leaderboard
1
2
3
4
5
6
7
8
9
10
11
12
| # Separate shard for pre-aggregated global leaderboard
# Updated periodically (e.g., every 5 minutes)
def update_global_leaderboard():
results = []
for region in ['NA', 'EU', 'APAC']:
shard = get_leaderboard_shard(region)
top_players = query_shard(shard, "SELECT TOP 1000")
results.extend(top_players)
# Sort and store in global leaderboard shard
global_top = sorted(results, key=lambda x: x.score)[:1000]
store_in_shard('global_leaderboard', global_top)
|
graph TB
subgraph "Multi-Tenant Sharding"
T1[Small Tenant A<br/>100 users]
T2[Small Tenant B<br/>50 users]
T3[Enterprise Tenant C<br/>10,000 users]
T4[Medium Tenant D<br/>500 users]
end
subgraph "Shards"
S1[(Shard 1<br/>Tenant A + B)]
S2[(Shard 2<br/>Tenant C<br/>Dedicated)]
S3[(Shard 3<br/>Tenant D)]
end
T1 --> S1
T2 --> S1
T3 --> S2
T4 --> S3
style S2 fill:#f96,stroke:#333,stroke-width:3px
Recommended Sharding Concepts:
- Shard Key:
tenant_id
- Strategy: Directory-based (flexible assignment)
- Isolation: Each tenant’s data on specific shard(s)
- Enterprise Tenants: Dedicated shards
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| class MultiTenantSharding:
def __init__(self):
self.tenant_to_shard = {} # Directory mapping
def assign_tenant_shard(self, tenant_id, tenant_tier):
if tenant_tier == 'enterprise':
# Dedicated shard for enterprise
shard = self.provision_new_shard()
elif tenant_tier == 'medium':
# Find least loaded shared shard
shard = self.get_least_loaded_shard()
else: # small
# Place multiple small tenants on same shard
shard = self.get_small_tenant_shard()
self.tenant_to_shard[tenant_id] = shard
return shard
def query_tenant_data(self, tenant_id, query):
# All tenant data on same shard
shard = self.tenant_to_shard[tenant_id]
return execute_query(shard, query)
# Benefits:
# - Fast queries (no cross-shard joins)
# - Strong isolation
# - Easy to migrate tenant to different shard
# - Enterprise SLA: dedicated resources
|
13. Challenges & Trade-offs
Hot Partition Problem
graph TB
subgraph "Celebrity Problem in Social Media"
C[Celebrity<br/>10M followers]
S1[(Shard 1<br/>20% load)]
S2[(Shard 2<br/>15% load)]
S3[(Shard 3<br/>OVERLOADED<br/>60% load)]
S4[(Shard 4<br/>5% load)]
end
Solutions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| # 1. Composite shard key
def get_celebrity_shard(user_id, timestamp):
# Split celebrity data across multiple shards by time
composite = f"{user_id}_{timestamp.hour}"
return hash(composite) % NUM_SHARDS
# 2. Application-level splitting
if is_celebrity(user_id):
# Write to multiple shards
for shard_id in get_celebrity_shards(user_id):
write_to_shard(shard_id, data)
else:
# Normal sharding
write_to_shard(get_shard(user_id), data)
# 3. Read replicas for hot shard
# Add more read replicas to hot shard specifically
|
Complexity
- Operational: More moving parts to manage
- Development: Application must be shard-aware
- Debugging: Distributed tracing required
Cross-Shard Queries
- Joins are expensive and complex
- Mitigation: Denormalization, careful shard key design
14. When to Use Sharding
Decision Tree
graph TB
Start[Need to scale<br/>database?]
Start --> Q1{Data size<br/>>1TB?}
Q1 -->|No| V1[Consider<br/>vertical scaling]
Q1 -->|Yes| Q2{Can optimize<br/>queries/indexes?}
Q2 -->|Yes| V2[Optimize first]
Q2 -->|No| Q3{Heavy<br/>cross-shard<br/>queries?}
Q3 -->|Yes| V3[Reconsider<br/>design or<br/>use denormalization]
Q3 -->|No| Q4{Team has<br/>distributed systems<br/>expertise?}
Q4 -->|No| V4[Use managed<br/>sharding service]
Q4 -->|Yes| V5[Implement<br/>sharding!]
style V5 fill:#9f9,stroke:#333,stroke-width:3px
style V3 fill:#f99,stroke:#333
Use Sharding When:
✅ Single database cannot handle load (CPU, memory, I/O)
✅ Data size exceeds single machine capacity
✅ Query patterns allow good shard key design
✅ Horizontal scaling is more cost-effective than vertical
✅ Geographic distribution is required
✅ Team has expertise to manage distributed systems
Avoid Sharding When:
❌ Vertical scaling still feasible and cost-effective
❌ Read replicas can handle read load
❌ Heavy cross-shard queries are common
❌ Team lacks distributed systems experience
❌ Data size is manageable (< 1TB typically)
❌ Query patterns don’t support good shard key
15. Real-World Examples
Instagram’s Sharding
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| # Instagram's PostgreSQL sharding strategy
# Shard key: user_id
# Why: Most queries are "show photos for user X"
def get_instagram_shard(user_id):
# Hash-based sharding
return user_id % NUM_SHARDS
# Co-location:
# - User profile: shard by user_id
# - User's photos: shard by author_id (not photo_id!)
# - User's followers: shard by followee_id
# Benefits:
# - "Show Alice's photos" → Single shard query
# - "Show Alice's followers" → Single shard query
# Trade-off:
# - "Show photos with tag #sunset" → Scatter-gather
# - Solution: Separate tag index with different sharding
|
Uber’s Schemaless
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| # Uber's sharding for trip data
# Strategy: Geographic + consistent hashing
def get_uber_shard(trip_id):
# Ringpop consistent hashing
# Minimal data movement when adding nodes
return consistent_hash(trip_id)
# Active trips: Geo-sharded
# - Shard by current region
# - Fast driver matching
# Historical trips: User-sharded
# - Shard by user_id
# - Fast "my trip history"
# Insight: Different sharding for different access patterns!
|
Discord’s Cassandra Sharding
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| # Discord message storage
# Shard key: (guild_id, channel_id)
# Why: Users read messages per channel
# Cassandra partitioning:
CREATE TABLE messages (
guild_id BIGINT,
channel_id BIGINT,
message_id BIGINT,
content TEXT,
PRIMARY KEY ((guild_id, channel_id), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
# Benefits:
# - "Get messages in channel" → Single partition query
# - Messages sorted by message_id (time-ordered)
# - Efficient pagination
# Scale:
# - Millions of guilds (servers)
# - Billions of messages
# - Cassandra handles sharding automatically
|
16. Best Practices Summary
- Choose shard key carefully: High cardinality, even distribution, matches access patterns
- Start with more shards: Easier to consolidate than split
- Plan for rebalancing: Use consistent hashing or virtual partitions
- Monitor continuously: Track data distribution and performance
- Keep related data together: Design for data locality
- Avoid cross-shard operations: Denormalize where necessary
- Test failure scenarios: Shard failures, network partitions, rebalancing
- Automate operations: Monitoring, failover, backups
- Document shard topology: Keep architecture diagrams current
- Plan capacity: Anticipate growth and shard expansion
Key Takeaways
- Sharding = Horizontal partitioning across multiple physical servers
- Shard key selection is the most critical design decision
- Consistent hashing minimizes data movement during rebalancing
- Different systems need different strategies: URL shortener vs social media vs analytics
- Co-location matters: Keep related data on same shard
- Trade-offs: Complexity vs. scalability
- Design principle: Keep transactions and queries within single shard when possible
- Real-world use: Essential for Internet-scale applications (billions of records, millions of concurrent users)
Quick Reference: Sharding Strategy Selection
| Use Case |
Shard Key |
Strategy |
Why |
| URL Shortener |
short_url |
Hash-based |
Single-key lookups, read-heavy |
| Social Media |
user_id |
Hash-based |
User-centric queries, co-location |
| E-commerce Orders |
user_id |
Hash-based |
“My orders” queries, co-location |
| E-commerce Products |
product_id |
Hash-based |
Independent from user data |
| Chat (Groups) |
conversation_id |
Hash-based |
Messages per conversation |
| Analytics |
timestamp |
Range-based |
Time-range queries, archival |
| Ride-Sharing |
region/geohash |
Geographic |
Local matching, low latency |
| Multi-Tenant SaaS |
tenant_id |
Directory-based |
Flexible assignment, isolation |
| Logging |
timestamp |
Range-based |
Time-series, partition pruning |
| Leaderboard |
region |
Geographic + Pre-compute |
Regional fast, global acceptable |
Sources & Further Reading