Rate Limiter Design
Rate Limiter Design
A system that controls the rate of requests a client can send to a service, protecting backend resources from overload and ensuring fair usage across consumers.
Scope note: This note covers system design of a rate limiter — the algorithms, distributed state management, and architectural placement decisions. The HTTP contract layer (429 status, Retry-After header, X-RateLimit response headers, client backoff) is documented in Operational-API-Patterns. This note does not re-explain HTTP response semantics.
Clarify First
Before designing, lock these assumptions — each shapes a different component:
- Per-user, per-IP, or per-API-key? Different key granularity drives the storage schema; per-user requires authenticated identity resolution before the counter lookup.
- Hard or soft limit? Hard: requests above the limit are rejected immediately. Soft: allow burst above limit with degraded SLA or queuing.
- What actions are rate limited? All endpoints, or only specific high-cost operations (e.g., auth, file upload, third-party calls)?
- Client-side or server-side? Throttle at API gateway, service middleware, or client library? Placement changes where state lives and who enforces the policy.
- Distributed system or single server? Single server is trivial (in-memory counter). Distributed is the interesting problem — the core design challenge.
Capacity Estimation
Assumption: rate limiter as a shared service for a mid-size API platform, 2026.
API requests to limit: 10M QPS total across all clients
Rate limit decisions: 1 decision per API request = 10M decisions/sec
Redis ops per decision: ~2 ops (INCR + EXPIRE, or sliding window lookups)
Required Redis throughput: 10M * 2 = 20M ops/sec
Redis per instance: ~100K–500K ops/sec (commodity hardware, 2026)
Required instances: 20M / 300K ≈ 67 instances → Redis Cluster with N shards
Storage per rate limit key:
key = "rate_limit:{user_id}:{window}" ≈ 50 bytes
Active concurrent users: 1M
Storage: 1M * 50 bytes = 50 MB
→ fits in a single Redis instance; sharding is for throughput, not storage
Cross-link: Capacity-Estimation for the shared derivation methodology.
Central Technical Problem
Distributed state consistency across rate limiter nodes.
When a rate limiter runs on N stateless application servers, each node has only its local counter. A user sending 10 concurrent requests may distribute across 10 nodes, each seeing count=1 against a limit of 5 — resulting in 10 allowed requests instead of 5.
Three solutions with distinct tradeoffs:
-
Centralized counter (Redis INCR): Atomic counter in shared Redis. Every request pays a network round-trip to Redis. Redis becomes a single point of failure unless clustered. This is the standard production approach.
-
Sticky routing: Route each
client_idto a dedicated rate limiter node via consistent hashing. Eliminates shared state but breaks on node failure — the client's counter is lost and limits reset. Prevents horizontal scale of the rate limiter itself. -
Eventual consistency with relaxed guarantees: Accept slight over-counting (nodes sync periodically). Suitable for soft limits where ~3% over-counting is acceptable. Avoids Redis as a bottleneck at the cost of enforcement accuracy.
Component Design
Algorithm Comparison
| Algorithm | How It Works | Memory | Accuracy | Burst Handling |
|---|---|---|---|---|
| Fixed window counter | Increment counter per window (e.g., 1-minute slots); reset at boundary | Low (1 int per key/window) | Boundary exploit: 2x burst possible straddling window reset | Allows burst at window boundary |
| Sliding window log | Store timestamp of every request; count requests in last 60s | High (1 entry per request) | Exact — no boundary exploit | Smooth; high memory cost |
| Sliding window counter | Blend current and previous window counters with time-decay weight | Low (2 ints per key) | ~0.003% error (Redis Labs study) | Good approximation of smooth |
| Token bucket | Tokens accumulate at fixed rate up to capacity; each request consumes 1 token | Low (token count + last_refill_time) | Smooth; burst allowed up to bucket capacity | Natural burst tolerance |
| Leaky bucket | Fixed-rate queue drain; excess requests queued or dropped | Low | Smooth output; bursty input absorbed into queue | Queue-based absorption |
Redis Implementation Patterns
// Fixed window — INCR + EXPIRE (note: INCR and EXPIRE are not atomic together)
key = "rate_limit:{user_id}:{floor(timestamp / window_seconds)}"
count = INCR key
IF count == 1: EXPIRE key window_seconds // race window: INCR succeeds, EXPIRE may not
IF count > limit: reject request
// Sliding window log — ZADD + ZCARD (sorted set of timestamps)
key = "rate_limit:{user_id}"
now = current_unix_ms
ZADD key now now // add current timestamp
ZREMRANGEBYSCORE key 0 (now - window_ms) // expire old entries
count = ZCARD key
IF count > limit: reject request
// Atomic INCR+EXPIRE via Lua script (eliminates the fixed-window race condition)
local count = redis.call('INCR', KEYS[1])
if count == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return count
The Lua script runs as a single atomic operation in Redis — no other command can interleave between INCR and EXPIRE.
Rate Limiter Placement
| Placement | Pros | Cons |
|---|---|---|
| API Gateway | Single enforcement point; no per-service work; cross-cutting concerns centralised | Gateway becomes a bottleneck; coarse granularity (per-route, not per-operation) |
| Service-level middleware | Fine-grained per-endpoint limits; no gateway dependency | Duplicated enforcement logic across services; each service manages its own Redis connection pool |
| Client-side | Zero server load | Unenforceable — clients can ignore or bypass |
System Diagram
Rate-Limiter-Design-diagram.excalidraw
Alternatives Considered
| Decision | Alternative | Why Chosen Approach Wins |
|---|---|---|
| Centralized Redis counter | Local in-memory counters per node | Local counters allow over-counting in distributed deployments; Redis provides atomic enforcement |
| Lua script for INCR+EXPIRE | Two separate Redis calls | Separate calls have a race window; Lua script is atomic at the Redis server level |
| Token bucket algorithm | Fixed window counter | Token bucket allows natural bursts up to bucket capacity without boundary exploits; fixed window double-counts at window resets |
| API Gateway placement | Per-service middleware | Gateway centralises enforcement and removes per-service boilerplate; tradeoff is gateway becoming a bottleneck |
| Redis Cluster sharding via consistent hashing | Replicated single Redis node | Single node becomes throughput bottleneck at 10M QPS; consistent hashing distributes key space without hotspots |
Likely Follow-Up Questions
-
What happens when Redis goes down? Fail-open (allow all traffic — protect availability) vs fail-closed (block all traffic — protect the backend). The choice is a business decision. Circuit-Breaker-Pattern governs the Redis fallback behaviour.
-
How do you handle rate limiting across multiple datacenters? Options: global counter via cross-datacenter Redis replication (high latency, strong consistency) vs per-datacenter counters (low latency, slight over-counting globally). Most systems accept per-datacenter enforcement for performance.
-
How do you support different rate limits for different API tiers? Rule store: a separate service or config layer maps
{client_id, endpoint}→{limit, window}. The rate limiter looks up the rule before applying the counter. -
How do you avoid Redis as a bottleneck at 10M+ QPS? Shard the key space across a Redis Cluster using Consistent-Hashing — each shard handles a partition of
client_idspace. Add local in-memory caching for rule lookups (immutable data). -
How do you test rate limiting in production? Shadow traffic: replay production traffic at 2x volume against a staging rate limiter. Chaos engineering: kill a Redis shard and verify fail-open behaviour. Synthetic load tests with concurrent clients hitting the same user ID.
Existing Pattern Connections
| Design Decision | Existing Pattern | Relationship |
|---|---|---|
| HTTP 429 response contract | Operational-API-Patterns | Rate-Limiter-Design owns the algorithm; Operational-API-Patterns owns the HTTP contract (429 status, Retry-After, X-RateLimit headers) — scope boundary |
| Redis as distributed state store | Distributed-Cache | Rate limiter counters stored in Redis; same Redis cluster may serve application caching; key namespace isolation (rate_limit: prefix) required |
| Redis failure handling | Circuit-Breaker-Pattern | If Redis becomes unavailable, the rate limiter trips a circuit breaker and fails open or closed based on policy |
| Redis key sharding | Consistent-Hashing | Sharding rate limit keys across Redis Cluster nodes uses consistent hashing to distribute load without hotspots |
| Rate limiting as gateway concern | API-Gateway-Pattern | Rate limiting is a natural API Gateway cross-cutting concern; centralised enforcement removes per-service overhead |
Backlinks
(populated in Phase 32 backlink sweep)