Capacity Estimation
Capacity Estimation
Back-of-envelope estimation methodology for deriving QPS, storage, and bandwidth requirements from user-level assumptions — the first step of every system design interview.
When NOT to Use
- Not a production capacity planning tool — use load testing and real traffic data for production sizing; back-of-envelope estimates have a margin of error of up to 10x and cannot substitute for profiling under real traffic
- Estimates are order-of-magnitude — a margin of error of 10x is acceptable in interviews; precise per-component sizing requires infrastructure telemetry and benchmarking
- Do not carry capacity numbers from this note verbatim into case study notes — derive numbers from DAU for each specific system; the latency reference table and power-of-two table are shared, but QPS and storage numbers are system-specific
Core Mechanism
DAU-to-QPS Derivation Chain
The fundamental formula for translating a user-level assumption into a server-level throughput target:
QPS = DAU * avg_requests_per_user / 86_400
Where:
DAU— Daily Active Users (the user-level anchor; get this number first)avg_requests_per_user— average API calls per user per day across all actions (read, write, search, like, etc.)86_400— seconds in a day (60 * 60 * 24); the constant denominator for any per-day-to-per-second conversion- Peak QPS multiplier: typically 2x–3x average; systems must handle bursts, not just averages
- Read/write ratio heuristics: typical systems are 10:1 to 100:1 read-heavy; the ratio drives architectural decisions (read replicas, CQRS read model, cache tier sizing)
Worked example: social media platform with 300M DAU (2024 estimate)
// Step 1: requests per user per day
avg_requests = 10
// Reasoning: read timeline 3x + post 0.5x + like 3x + search 3.5x
// = 10 actions/day; this is a judgment call, not a fact — state assumptions
// Step 2: total requests per day
requests_per_day = 300M * 10 = 3B
// = 3,000,000,000 requests per day across all users
// Step 3: average QPS
avg_QPS = 3B / 86_400 ≈ 35,000 QPS
// = 3,000,000,000 / 86,400 = 34,722; round to ~35,000 QPS
// Use this as the baseline; do not skip showing the division
// Step 4: peak QPS (assume 2x average for a social feed; 3x for live events)
peak_QPS = 35,000 * 2 = 70,000 QPS
// The multiplier accounts for primetime traffic bursts; state the multiplier explicitly
// Step 5: read/write split (typical: 90% reads for a social timeline)
read_QPS = 70,000 * 0.9 = 63,000 QPS
write_QPS = 70,000 * 0.1 = 7,000 QPS
// The 90/10 split motivates separate read and write scaling paths
// → read_QPS drives cache sizing and read replica count
// → write_QPS drives write throughput requirements and leader node sizing
Never state "12K QPS" without showing the derivation: "10M DAU * 100 req/day / 86,400s = 11,574 QPS, round to ~12K QPS." Bare numbers cannot be audited or adjusted when assumptions change. In a system design interview, showing the derivation demonstrates that you understand the estimation methodology, not just the answer.
Storage and Bandwidth Formulas
Storage formula:
total_storage = storage_per_record * records_per_day * retention_days
Where retention_days is a product requirement (e.g., 5 years = 1,825 days) and storage_per_record is the average serialised size of one entity including metadata.
Bandwidth formula:
bandwidth = throughput * message_size
Where throughput is the write QPS (for ingress bandwidth) or the read QPS (for egress bandwidth) and message_size is the average response payload size.
Worked example:
// If each record is 1 KB, 100K writes/day, 5-year retention:
total_storage = 1 KB * 100,000 * 1,825 days
= 1,000 bytes * 100,000 * 1,825
= 182,500,000,000 bytes
≈ 182 GB
// Round to ~200 GB accounting for metadata and index overhead (10–20%)
// At this scale a single PostgreSQL primary is sufficient; sharding is premature
Reference Tables
Power-of-Two Table
These values are the common currency of back-of-envelope storage estimates. Memory all quantities in this table — an interviewer expects instant recall.
| Power | Approximate Value | Full Name | Short Name |
|---|---|---|---|
| 2^10 | 1 Thousand | 1 Kilobyte | 1 KB |
| 2^20 | 1 Million | 1 Megabyte | 1 MB |
| 2^30 | 1 Billion | 1 Gigabyte | 1 GB |
| 2^40 | 1 Trillion | 1 Terabyte | 1 TB |
| 2^50 | 1 Quadrillion | 1 Petabyte | 1 PB |
Latency Numbers to Know (2024 reference)
Use these to reason about what a system "can" and "cannot" do within a given SLA. The order-of-magnitude differences (ns vs ms) are the critical insight: memory access is 6 orders of magnitude faster than a cross-datacenter round trip.
| Operation | Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| Main memory reference | 100 ns |
| SSD random read (4KB) | 100 microseconds |
| Network round trip (same DC) | 500 microseconds |
| Disk seek | 10 ms |
| Network round trip (cross-DC) | 150 ms |
Latency numbers shift with hardware generations. Annotate with year (e.g., "2024 reference"). SSD random read was 150 microseconds a decade ago; current NVMe drives approach 10 microseconds for sequential reads. The table above reflects current NAND SSD random read; NVMe sequential read is significantly faster. When using numbers in a design, state the year so estimates can be updated as hardware improves.
Component Diagram
Capacity-Estimation-diagram.excalidraw
Pitfalls
The two [!warning] callouts above cover the primary pitfalls (bare numbers without reasoning chain, and stale reference numbers). Two secondary pitfalls:
- Confusing average QPS with peak QPS: Sizing systems for average throughput means they fail under peak load. Always derive peak QPS explicitly and size for peak, not average. State the peak multiplier assumption (2x for gradual growth, 3x for bursty traffic patterns like live events or flash sales).
- Ignoring compression and encoding: Real payloads are compressed (gzip, zstd) and encoded (protobuf vs JSON). A 1 KB JSON record may compress to 300 bytes. For bandwidth estimates, apply a compression ratio appropriate to the content type; images and video have minimal compressibility, structured JSON compresses well (3–5x).
Existing Pattern Connections
- CQRS-Pattern — read/write ratio analysis (10:1 to 100:1) directly motivates CQRS's separate read model; the estimation step quantifies the read/write asymmetry that justifies the pattern: when write_QPS is 7K and read_QPS is 63K, a shared read/write data model creates contention that a separate read model (and read replica or cache tier) resolves
- API-Protocol-Selection-MOC — HTTP request sizing (JSON vs protobuf vs GraphQL response shape) informs bandwidth estimates; protocol choice affects per-request byte cost; a protobuf-encoded payload is typically 3–5x smaller than the equivalent JSON, which directly reduces egress bandwidth and shifts the read bandwidth calculation
- Operational-API-Patterns — pagination and rate limiting strategies affect per-request payload size and throughput calculations; a paginated endpoint returning 20 items per page produces very different storage and bandwidth numbers than one returning 200 items; rate limiting constrains write_QPS at the API layer, which must be accounted for when sizing write throughput