Part 2: Basic System Design (MVP)
1. Introduction: Start Simple, Then Break It
Before we worry about sharding and eventual consistency, let’s solve the core problem. How do we store a tag?
Refusing to over-engineer at the start is a seniority signal. We begin with a Minimum Viable Product (MVP)—a clean, monolithic design that satisfies the functional requirements. This establishes a baseline so we can see exactly where it breaks when we add load.
2. High-Level Architecture (The Naive Approach)
For the MVP, we use the industry standard “boring” stack:
- Stateless Service: A standard REST API (Go/Java).
- Relational Database (PostgreSQL): Our single source of truth. Why SQL? Because relationships (Tags <-> Content) are the core of our domain, and ACID guarantees keep our data sane.
- Read-Through Cache (Redis): To protect the DB from repetitive “Show tags for Issue X” calls.
graph LR
User --> LoadBalancer
LoadBalancer --> API[Tag API Service]
API --> DB[(PostgreSQL)]
API --> Cache[(Redis)]
subgraph "MVP Components: The Trinity"
API
DB
Cache
end
3. Basic Design Details
Component Breakdown
- Tag API Service: A Go/Java microservice. It handles validation (e.g., tag length), normalization (lowercase), and DB transactions.
- PostgreSQL: Handles relational data. Good consistency/ACID guarantees are helpful for the
TAG+CONTENT_TAGwrites. - Redis: Caches the “GET tags for content” response to offload the DB.
Write Path: Adding a Tag (FR1)
- User sends
POST /content/123/tagswithtag_name="Urgent". - App: Normalizes “Urgent” -> “urgent”.
- App: Checks if “urgent” exists in
TAGtable.- If no:
INSERT INTO TAG ...(returning ID). - If yes: Get existing ID.
- If no:
- App:
INSERT INTO CONTENT_TAG (content_id, tag_id) ... - App: Returns success.
3.1 Bonus: Implementing Typeahead (Autocomplete)
Users expect suggestions as they type “urg…”. We can’t run LIKE 'urg%' on Postgres at scale.
- Solution: Redis Sorted Sets (
ZSET). - Mechanism: Store all tags in a ZSET with
score=0. - Query: Use
ZRANGEBYLEX [urg [urg\xffto find all strings starting with “urg”. - Why: It’s O(log(N)) + M, extraordinarily fast (microseconds) for prefix lookups.
sequenceDiagram
participant U as User
participant S as Tag Service
participant DB as Database
U->>S: POST tag="Urgent"
S->>S: Normalize "urgent"
S->>DB: SELECT id FROM tags WHERE name="urgent"
alt Tag doesn't exist
S->>DB: INSERT INTO tags (name) VALUES ("urgent")
DB-->>S: Return new ID
end
S->>DB: INSERT INTO content_tags (c_id, t_id)
DB-->>S: Success
S-->>U: Tag Added 201
Read Path: Get Content Tags (FR3)
- User sends
GET /content/123/tags. - App: Check Redis key
content:123:tags. - Cache Hit: Return JSON immediately.
- Cache Miss:
- JOIN
CONTENT_TAGandTAGtables in DB. - Write result to Redis (TTL 1 hour).
- Return JSON.
- JOIN
4. Basic Design Limitation/Tradeoffs
This MVP works well for ~1,000 requests/sec. But at our target scale (100k reads/sec, 50k writes/sec), it breaks:
- Database Bottleneck: A single Postgres instance cannot handle 50k writes/sec comfortably, especially with index overhead.
- “Justin Bieber” Problem (Hot Tags): Searching for content by a popular tag (e.g., “bug”) might return 10M rows. The DB query
SELECT * FROM content_tags WHERE tag_id = Xbecomes huge and slow. - Popularity Calculation: Running
SELECT count(*) GROUP BY tag_idon the main transactional table will lock rows/tables and degrade write performance. - Single Point of Failure: One DB instance means zero redundancy.
These limitations set the stage for our Deep Dives in the next parts of the series.