System Design Cheatsheet (Expert-Level, 80/20 Coverage)
If you internalize this cheatsheet, you’ll cover the majority of real-world system design discussions and interview expectations.
This is not “what is a load balancer” material. This is the stuff that decides whether your system survives its first incident.
0) The One-Minute Mental Model
System design is trade-offs under constraints.
In every design, you are balancing:
- Correctness (consistency, integrity, invariants)
- Latency (p50 vs p99/p999)
- Throughput (QPS, write amplification)
- Availability (degradation vs outage)
- Cost (compute, storage, bandwidth, operational load)
- Operability (deploy, debug, rollback, migrate)
If you can answer these 8 questions, you’re already “senior” in system design:
- What’s the SLO (p99 latency, error rate, availability)?
- What’s the peak (traffic, fanout, burstiness) and growth curve?
- What’s the data model (entities + access patterns + invariants)?
- Where do we need strong consistency vs eventual?
- What are the hot keys / skew / supernodes?
- What is the blast radius of a dependency failure?
- What’s the backpressure strategy?
- How do we migrate without downtime?
1) First 10 Minutes: Frame the Problem Like an Architect
Workload profile (write this down)
- Read/write ratio (e.g., 90/10)
- Data size and growth (GB/day, records/day)
- Access patterns (by user, by time, by geo)
- Latency target (p99, not average)
- Consistency requirements (what must never be wrong?)
- Fanout patterns (celebrity problem, multi-tenant whales)
“Constraints-first” checklist
- Regulatory: PII, retention, deletion, audit
- Geo: single-region vs multi-region active/active
- Failure domains: zone, region, provider
- Budget: infra cost and headcount (operations is expensive)
2) Core Building Blocks (The Standard Toolkit)
The canonical shape of modern systems
flowchart LR
Client --> Edge[CDN/WAF/API Gateway]
Edge --> Svc[App Services]
Svc --> Cache[(Cache)]
Svc --> DB[(Primary DB)]
Svc --> Search[(Search/Index)]
Svc --> Queue[(Queue/Stream)]
Queue --> Worker[Async Workers]
Worker --> DB
Worker --> Blob[(Object Storage)]
Svc --> Obs[Logs/Metrics/Tracing]
A useful classification
- Serving path: must meet p99 latency (API + cache + DB reads)
- Write path: must preserve invariants (validation + idempotency + durability)
- Async path: absorbs burst, isolates failures (queues/streams + workers)
- Control plane: config, deploys, migrations, feature flags
3) Data: Model First, Then Pick Storage
Data model cheatsheet
Write down:
- Entities: User, Post, Order, Payment, Session
- Relationships: 1:1, 1:N, N:N
- Queries: “get timeline”, “search”, “recent orders”, “by tenant”, “by time range”
- Invariants: “no double charge”, “unique username”, “inventory never negative”
Storage selection (fast heuristics)
| Need | Usually pick | Why | |—|—|—| | Strong transactions + joins | Relational DB | Constraints + indexes + mature tooling | | Massive scale key lookups | Key-value store | Predictable latency, partition-friendly | | Time range analytics | Columnar / OLAP | Compression + scans | | Flexible documents | Document DB | Schema evolution + nested objects | | Full-text search | Search engine | Inverted index + ranking | | Large blobs | Object storage | Cheap, scalable, CDN-friendly | | Events as source of truth | Log/stream + consumers | Replay + decoupling |
War story rule
If you don’t know your access patterns, don’t design your sharding key.
4) Consistency: Decide What Must Be True
Think in invariants
Examples:
- Payments: “charge exactly once” (or “at most once + reconciliation”)
- Inventory: “stock never negative”
- Usernames: “unique globally”
CAP/PACELC (use it correctly)
- Under a partition: choose Consistency (CP) or Availability (AP)
- Else: choose Latency (EL) or Consistency (EC)
Consistency patterns you actually use
| Problem | Pattern | Notes | |—|—|—| | Cross-service transaction | Saga | Compensations + idempotency | | Reliable side effects | Outbox + CDC | Prevents “DB commit but event lost” | | Ordering | Partition by key | One key → one ordered stream | | Concurrency control | Optimistic (version) | Most scalable default | | Duplicate requests | Idempotency key | Mandatory for retries |
Default stance: strong consistency only where invariants demand it; eventual consistency everywhere else.
5) Caching: The Fast Path and the Failure Path
What caching is really for
- Reduce read load
- Reduce tail latency
- Provide graceful degradation when DB is struggling
Common cache patterns
| Pattern | When | Risk | |—|—|—| | Cache-aside | Default | Stampede on miss | | Read-through | Platform-controlled | Latency spikes on backend | | Write-through | Need immediate cache consistency | Higher write latency | | Write-behind | High write throughput | Data loss if not durable | | Refresh-ahead | Predictable hot keys | Complexity |
Cache failure playbook
- Stampede: jitter TTL + request coalescing
- Cold start: warm top keys, or degrade to stale
- Stale tolerance: stale-while-revalidate (serve stale, refresh async)
6) Scale: Partitioning, Sharding, and Hot Keys
Sharding key rules
Your sharding key must:
- Match the dominant access pattern
- Avoid hot partitions (skew)
- Be stable over time (or have a re-sharding plan)
Patterns that fix real problems
| Problem | Fix | Notes | |—|—|—| | Hot key / celebrity user | Hybrid fanout | Push for normal, pull for supernodes | | Tenant whale | Virtual shards | Split one tenant across buckets | | Time-based hotspot | Hash prefix + time | Or use LSM-friendly stores | | Uneven load | Consistent hashing | Helps smooth node adds/removes |
The most common scaling lie
“We’ll just add more shards.”
If your partition key is wrong, you don’t need more shards. You need a new key (and a migration plan).
7) Messaging: Queues vs Streams (Pick the Right Weapon)
Decision guide
| You need | Use | Why | |—|—|—| | Background jobs | Queue | Simple work distribution | | Event history + replay | Stream/log | Reprocess + time travel | | Exactly-once illusion | Stream + idempotent consumers | Operationally realistic | | Ordering per key | Stream partitioning | Deterministic order |
Reliability patterns
- Retries with backoff (bounded)
- DLQ (dead-letter queue) for poison pills
- Idempotent consumers (must)
- Deduplication (idempotency key or content hash)
8) Resilience: Design for Partial Failure
The Big 6 resilience patterns
- Timeouts (always) — no unbounded waits
- Retries (carefully) — with jitter + budgets
- Circuit breaker — stop cascading failures
- Bulkheads — isolate tenants/endpoints/dependencies
- Load shedding — degrade gracefully, protect core
- Backpressure — push work back when overloaded
Retry math (why people get this wrong)
Retries increase traffic during incidents. If you retry everything, you can DDoS your own dependencies.
Rules:
- Retry only idempotent operations or with idempotency keys
- Retry only on transient failures (timeouts, 429, 503)
- Use retry budgets per service
9) Latency: p99 Is the Product
Tail latency compounds
If a request calls N dependencies sequentially, the probability of hitting a tail event rises quickly.
Practical fixes:
- Parallelize fanout calls
- Use hedged requests for critical reads
- Add fallbacks (partial UI/data)
- Precompute when it’s cheaper than computing on read
Network bottlenecks people discover too late
- Connection churn (ephemeral ports, TIME_WAIT)
- TLS handshakes (terminate smartly)
- Cross-zone chatter (expensive + slow)
10) Observability: Debuggability Is a Feature
The “triad”
- Metrics: low-cardinality aggregates (SLOs)
- Logs: high-cardinality details
- Traces: causality across services
Golden signals (start here)
- Latency (p50/p95/p99)
- Traffic (QPS)
- Errors (rate + types)
- Saturation (CPU, memory, queue depth, thread pools)
Cardinality warning
Never put unbounded IDs (user_id, request_id, email) into metric labels.
11) Security & Abuse (Often the Real Bottleneck)
Minimum viable security for “internet-facing” systems:
- AuthN/AuthZ (token validation, scopes)
- Rate limiting + WAF
- Input validation + payload limits
- Secrets management
- Audit logs for sensitive actions
Abuse patterns to plan for:
- Credential stuffing
- Scraping and bot traffic
- Multi-tenant noisy neighbor attacks
12) Migrations: The Part That Separates Theory from Reality
Safe schema changes (boring but crucial)
- Expand → backfill → switch reads → switch writes → contract
- Dual writes only with reconciliation and end-to-end idempotency
Data migration playbook
- Backfill in small batches
- Verify with checksums / sampling
- Feature-flag the cutover
- Keep rollback path
13) A Compact “Design Interview” Script (Repeatable)
- Requirements + SLOs
- APIs + core entities
- High-level architecture
- Data model + storage choice
- Consistency + failure modes
- Scaling strategy (shards, caches, async)
- Observability + ops + migrations
If you can do this smoothly, you will look like someone who has shipped production systems.
14) The 20 Red Flags (Expert Smell Test)
- “We’ll add retries” (without idempotency)
- “We’ll shard later” (no plan)
- “We store files in the DB” (without a reason)
- “We use exactly-once” (without defining it)
- “We’ll use microservices for scale” (without a latency plan)
- “We’ll do active-active” (without conflict resolution)
- “Metrics per user” (cardinality explosion)
- “No backpressure needed” (death spiral incoming)
Optional: Your Personal Mastery Loop
To go from “knows patterns” to “expert”:
- Pick a system (feed, payments, chat, search).
- Write its invariants.
- List failure modes.
- Add mitigations (timeouts, idempotency, backpressure).
- Design migrations.
That’s how you build real instincts.