Rate Limiter Design

Rate Limiter Design

A system that controls the rate of requests a client can send to a service, protecting backend resources from overload and ensuring fair usage across consumers.

Scope note: This note covers system design of a rate limiter — the algorithms, distributed state management, and architectural placement decisions. The HTTP contract layer (429 status, Retry-After header, X-RateLimit response headers, client backoff) is documented in Operational-API-Patterns. This note does not re-explain HTTP response semantics.

Clarify First

Before designing, lock these assumptions — each shapes a different component:

  1. Per-user, per-IP, or per-API-key? Different key granularity drives the storage schema; per-user requires authenticated identity resolution before the counter lookup.
  2. Hard or soft limit? Hard: requests above the limit are rejected immediately. Soft: allow burst above limit with degraded SLA or queuing.
  3. What actions are rate limited? All endpoints, or only specific high-cost operations (e.g., auth, file upload, third-party calls)?
  4. Client-side or server-side? Throttle at API gateway, service middleware, or client library? Placement changes where state lives and who enforces the policy.
  5. Distributed system or single server? Single server is trivial (in-memory counter). Distributed is the interesting problem — the core design challenge.

Capacity Estimation

Assumption: rate limiter as a shared service for a mid-size API platform, 2026.

API requests to limit:        10M QPS total across all clients
Rate limit decisions:         1 decision per API request = 10M decisions/sec
Redis ops per decision:       ~2 ops (INCR + EXPIRE, or sliding window lookups)
  Required Redis throughput:  10M * 2 = 20M ops/sec
  Redis per instance:         ~100K–500K ops/sec (commodity hardware, 2026)
  Required instances:         20M / 300K ≈ 67 instances → Redis Cluster with N shards

Storage per rate limit key:
  key = "rate_limit:{user_id}:{window}" ≈ 50 bytes
  Active concurrent users:    1M
  Storage:                    1M * 50 bytes = 50 MB
  → fits in a single Redis instance; sharding is for throughput, not storage

Cross-link: Capacity-Estimation for the shared derivation methodology.

Central Technical Problem

Distributed state consistency across rate limiter nodes.

When a rate limiter runs on N stateless application servers, each node has only its local counter. A user sending 10 concurrent requests may distribute across 10 nodes, each seeing count=1 against a limit of 5 — resulting in 10 allowed requests instead of 5.

Three solutions with distinct tradeoffs:

  1. Centralized counter (Redis INCR): Atomic counter in shared Redis. Every request pays a network round-trip to Redis. Redis becomes a single point of failure unless clustered. This is the standard production approach.

  2. Sticky routing: Route each client_id to a dedicated rate limiter node via consistent hashing. Eliminates shared state but breaks on node failure — the client's counter is lost and limits reset. Prevents horizontal scale of the rate limiter itself.

  3. Eventual consistency with relaxed guarantees: Accept slight over-counting (nodes sync periodically). Suitable for soft limits where ~3% over-counting is acceptable. Avoids Redis as a bottleneck at the cost of enforcement accuracy.

Component Design

Algorithm Comparison

AlgorithmHow It WorksMemoryAccuracyBurst Handling
Fixed window counterIncrement counter per window (e.g., 1-minute slots); reset at boundaryLow (1 int per key/window)Boundary exploit: 2x burst possible straddling window resetAllows burst at window boundary
Sliding window logStore timestamp of every request; count requests in last 60sHigh (1 entry per request)Exact — no boundary exploitSmooth; high memory cost
Sliding window counterBlend current and previous window counters with time-decay weightLow (2 ints per key)~0.003% error (Redis Labs study)Good approximation of smooth
Token bucketTokens accumulate at fixed rate up to capacity; each request consumes 1 tokenLow (token count + last_refill_time)Smooth; burst allowed up to bucket capacityNatural burst tolerance
Leaky bucketFixed-rate queue drain; excess requests queued or droppedLowSmooth output; bursty input absorbed into queueQueue-based absorption

Redis Implementation Patterns

// Fixed window — INCR + EXPIRE (note: INCR and EXPIRE are not atomic together)
key = "rate_limit:{user_id}:{floor(timestamp / window_seconds)}"
count = INCR key
IF count == 1: EXPIRE key window_seconds   // race window: INCR succeeds, EXPIRE may not
IF count > limit: reject request

// Sliding window log — ZADD + ZCARD (sorted set of timestamps)
key = "rate_limit:{user_id}"
now = current_unix_ms
ZADD key now now                                 // add current timestamp
ZREMRANGEBYSCORE key 0 (now - window_ms)         // expire old entries
count = ZCARD key
IF count > limit: reject request

// Atomic INCR+EXPIRE via Lua script (eliminates the fixed-window race condition)
local count = redis.call('INCR', KEYS[1])
if count == 1 then
  redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return count

The Lua script runs as a single atomic operation in Redis — no other command can interleave between INCR and EXPIRE.

Rate Limiter Placement

PlacementProsCons
API GatewaySingle enforcement point; no per-service work; cross-cutting concerns centralisedGateway becomes a bottleneck; coarse granularity (per-route, not per-operation)
Service-level middlewareFine-grained per-endpoint limits; no gateway dependencyDuplicated enforcement logic across services; each service manages its own Redis connection pool
Client-sideZero server loadUnenforceable — clients can ignore or bypass

System Diagram

Rate-Limiter-Design-diagram.excalidraw

Alternatives Considered

DecisionAlternativeWhy Chosen Approach Wins
Centralized Redis counterLocal in-memory counters per nodeLocal counters allow over-counting in distributed deployments; Redis provides atomic enforcement
Lua script for INCR+EXPIRETwo separate Redis callsSeparate calls have a race window; Lua script is atomic at the Redis server level
Token bucket algorithmFixed window counterToken bucket allows natural bursts up to bucket capacity without boundary exploits; fixed window double-counts at window resets
API Gateway placementPer-service middlewareGateway centralises enforcement and removes per-service boilerplate; tradeoff is gateway becoming a bottleneck
Redis Cluster sharding via consistent hashingReplicated single Redis nodeSingle node becomes throughput bottleneck at 10M QPS; consistent hashing distributes key space without hotspots

Likely Follow-Up Questions

  1. What happens when Redis goes down? Fail-open (allow all traffic — protect availability) vs fail-closed (block all traffic — protect the backend). The choice is a business decision. Circuit-Breaker-Pattern governs the Redis fallback behaviour.

  2. How do you handle rate limiting across multiple datacenters? Options: global counter via cross-datacenter Redis replication (high latency, strong consistency) vs per-datacenter counters (low latency, slight over-counting globally). Most systems accept per-datacenter enforcement for performance.

  3. How do you support different rate limits for different API tiers? Rule store: a separate service or config layer maps {client_id, endpoint}{limit, window}. The rate limiter looks up the rule before applying the counter.

  4. How do you avoid Redis as a bottleneck at 10M+ QPS? Shard the key space across a Redis Cluster using Consistent-Hashing — each shard handles a partition of client_id space. Add local in-memory caching for rule lookups (immutable data).

  5. How do you test rate limiting in production? Shadow traffic: replay production traffic at 2x volume against a staging rate limiter. Chaos engineering: kill a Redis shard and verify fail-open behaviour. Synthetic load tests with concurrent clients hitting the same user ID.

Existing Pattern Connections

Design DecisionExisting PatternRelationship
HTTP 429 response contractOperational-API-PatternsRate-Limiter-Design owns the algorithm; Operational-API-Patterns owns the HTTP contract (429 status, Retry-After, X-RateLimit headers) — scope boundary
Redis as distributed state storeDistributed-CacheRate limiter counters stored in Redis; same Redis cluster may serve application caching; key namespace isolation (rate_limit: prefix) required
Redis failure handlingCircuit-Breaker-PatternIf Redis becomes unavailable, the rate limiter trips a circuit breaker and fails open or closed based on policy
Redis key shardingConsistent-HashingSharding rate limit keys across Redis Cluster nodes uses consistent hashing to distribute load without hotspots
Rate limiting as gateway concernAPI-Gateway-PatternRate limiting is a natural API Gateway cross-cutting concern; centralised enforcement removes per-service overhead

(populated in Phase 32 backlink sweep)