System Design MOC

System Design MOC

Navigation hub for 19 system design notes: 11 building blocks (4 foundation + 7 infrastructure) and 8 case studies (4 simple + 4 scale-heavy). Use the decision framework below to select building blocks, or browse by category. See Design-Patterns-MOC for the root vault entry point.


Foundation Building Blocks

PatternIntentUse When
Capacity-EstimationDAU-to-QPS derivation, storage/bandwidth formulas, and latency reference numbers for system design interviews.Starting any system design problem -- estimate scale before choosing components
CAP-TheoremClassify distributed systems by consistency model (strong, eventual, causal) and partition tolerance tradeoffs.Choosing between CP and AP systems; evaluating database consistency guarantees
Consistent-HashingMap keys to nodes on a hash ring with virtual nodes for balanced distribution and minimal remapping on node changes.Distributing data or load across a dynamic set of nodes (caches, databases, load balancers)
Bloom-FilterProbabilistic set membership test: guaranteed no false negatives, tunable false positive rate.Cache penetration prevention, URL deduplication, pre-filtering before expensive lookups

Foundation selection guide: Start every system design with Capacity-Estimation -- the numbers drive all subsequent decisions. Use CAP-Theorem when choosing between consistency and availability for distributed storage. Use Consistent-Hashing when you need to distribute data or load across nodes that can be added or removed. Use Bloom-Filter when you need fast approximate set membership to avoid expensive lookups.


Infrastructure Building Blocks

PatternIntentUse When
Load-BalancerDistribute incoming traffic across server instances using L4/L7 algorithms (round-robin, least connections, consistent hashing).Any system serving more traffic than a single server can handle
Distributed-CacheStore frequently accessed data in memory (cache-aside, write-through, write-behind) to reduce database load.Read-heavy workloads; database latency too high; need sub-millisecond response times
Content-Delivery-NetworkServe static and cacheable content from edge servers geographically close to users.Global user base; static assets (images, JS, CSS); video streaming; latency-sensitive content delivery
Message-QueueDecouple producers from consumers with point-to-point or pub/sub delivery and configurable guarantees.Async processing; traffic spike buffering; service-to-service decoupling
SQL-vs-NoSQLDecision framework for relational vs non-relational storage based on access patterns, consistency, and schema flexibility.Choosing a database for a new system or evaluating whether to migrate from one model to another
Database-ReplicationCopy data across database nodes (single-leader, multi-leader, leaderless) for availability and read scaling.Read scaling; high availability; disaster recovery; geographic distribution
Database-ShardingPartition data horizontally across database nodes using range, hash, or directory strategies.Single database node cannot handle write volume or storage size; need horizontal write scaling

Infrastructure selection guide: For traffic distribution, start with Load-Balancer. For read latency, add Distributed-Cache (application layer) or Content-Delivery-Network (edge layer). For async processing and decoupling, use Message-Queue. For storage selection, use SQL-vs-NoSQL as the decision framework. For read scaling or HA, add Database-Replication. For write scaling, use Database-Sharding -- but only after exhausting vertical scaling and read replicas.


Case Studies

Simple

Case StudyCentral Technical ProblemKey Building Blocks
URL-Shortener-DesignBase62 encoding with collision handling at read-heavy scaleDistributed-Cache, Database-Replication, Load-Balancer
Rate-Limiter-DesignDistributed state consistency across rate limiter nodesDistributed-Cache, Load-Balancer
Unique-ID-Generator-DesignGlobally unique IDs without coordination across datacentersDatabase-Sharding (shard key generation)
Notification-System-DesignReliable cross-channel fan-out with per-user rate limitingMessage-Queue, Distributed-Cache, Load-Balancer

Scale-Heavy

Case StudyCentral Technical ProblemKey Building Blocks
News-Feed-DesignCelebrity problem -- fan-out-on-write breaks at 10M+ followersDistributed-Cache, Message-Queue, Database-Sharding
Chat-System-DesignMessage ordering guarantees + offline deliveryMessage-Queue, Database-Sharding, Load-Balancer
Search-Autocomplete-DesignTop-K precomputation: latency vs freshness tradeoffDistributed-Cache, Load-Balancer
Web-Crawler-DesignDistributed URL frontier + politeness without coordination bottleneckMessage-Queue, Bloom-Filter, Consistent-Hashing

Case study selection guide: Simple case studies (URL Shortener, Rate Limiter, Unique ID Generator, Notification System) each focus on one core technical problem with 2-3 building blocks. Scale-heavy case studies (News Feed, Chat, Search Autocomplete, Web Crawler) involve multiple interacting subsystems and are suitable for 45-minute interview depth. Start with simple case studies to build fluency before attempting scale-heavy ones.


Selection Guide -- Which Building Block for Which Problem

Is the problem about handling more traffic than one server can serve? -> Yes: Load-Balancer (then consider Distributed-Cache for read latency) -> No: Continue

Is the problem about reducing database read latency or load? -> Yes, application-level caching: Distributed-Cache -> Yes, static content to global users: Content-Delivery-Network -> No: Continue

Is the problem about decoupling services or handling async workloads? -> Yes: Message-Queue -> No: Continue

Is the problem about choosing a database model? -> Yes: SQL-vs-NoSQL -> No: Continue

Is the problem about database read scaling or high availability? -> Yes: Database-Replication -> No: Continue

Is the problem about database write scaling or storage limits? -> Yes: Database-Sharding (ensure you have a good shard key -- see Consistent-Hashing) -> No: Continue

Is the problem about distributed consistency guarantees? -> Yes: CAP-Theorem (then choose CP or AP based on your requirements) -> No: Continue

Is the problem about fast approximate set membership? -> Yes: Bloom-Filter -> No: Re-examine your requirements -- the building block may not be in this vault yet


Interview Preparation Guide

1. Clarify (2-3 minutes)

  • Identify functional requirements (what the system does)
  • Identify non-functional requirements (scale, latency, availability, consistency)
  • Ask about user scale: DAU, peak QPS, geographic distribution
  • Use Capacity-Estimation methodology to anchor the conversation in numbers

2. Estimate (3-5 minutes)

  • Apply Capacity-Estimation formulas: DAU -> QPS -> storage -> bandwidth
  • State assumptions explicitly and annotate with reasoning chains
  • Key variables: read/write ratio, average object size, retention period

3. Design (15-20 minutes)

  • Identify high-level components using building blocks from this MOC
  • Name the central technical problem (the one thing that makes this system hard)
  • Make database choice using SQL-vs-NoSQL framework
  • Add Load-Balancer, Distributed-Cache, Message-Queue where justified by capacity estimates

4. Deep Dive (10-15 minutes)

  • Address the central technical problem in detail
  • Discuss tradeoffs (consistency vs availability per CAP-Theorem, latency vs throughput)
  • Identify failure modes and mitigations
  • Reference specific case study notes above for worked examples


Sources

Building block notes based on Kleppmann Designing Data-Intensive Applications (2017), Xu System Design Interview Vol. 1 (2020) and Vol. 2 (2022). Case study notes follow the Xu framework with reasoning chain annotations.