Rate Limiter Design

A system that controls the rate of requests a client can send to a service, protecting backend resources from overload and ensuring fair usage across consumers.

Scope note: This note covers system design of a rate limiter — the algorithms, distributed state management, and architectural placement decisions. The HTTP contract layer (429 status, Retry-After header, X-RateLimit response headers, client backoff) is documented in Operational-API-Patterns. This note does not re-explain HTTP response semantics.

Clarify First

Before designing, lock these assumptions — each shapes a different component:

Per-user, per-IP, or per-API-key? Different key granularity drives the storage schema; per-user requires authenticated identity resolution before the counter lookup.
Hard or soft limit? Hard: requests above the limit are rejected immediately. Soft: allow burst above limit with degraded SLA or queuing.
What actions are rate limited? All endpoints, or only specific high-cost operations (e.g., auth, file upload, third-party calls)?
Client-side or server-side? Throttle at API gateway, service middleware, or client library? Placement changes where state lives and who enforces the policy.
Distributed system or single server? Single server is trivial (in-memory counter). Distributed is the interesting problem — the core design challenge.

Capacity Estimation

Assumption: rate limiter as a shared service for a mid-size API platform, 2026.

API requests to limit:        10M QPS total across all clients
Rate limit decisions:         1 decision per API request = 10M decisions/sec
Redis ops per decision:       ~2 ops (INCR + EXPIRE, or sliding window lookups)
  Required Redis throughput:  10M * 2 = 20M ops/sec
  Redis per instance:         ~100K–500K ops/sec (commodity hardware, 2026)
  Required instances:         20M / 300K ≈ 67 instances → Redis Cluster with N shards

Storage per rate limit key:
  key = "rate_limit:{user_id}:{window}" ≈ 50 bytes
  Active concurrent users:    1M
  Storage:                    1M * 50 bytes = 50 MB
  → fits in a single Redis instance; sharding is for throughput, not storage

Cross-link: Capacity-Estimation for the shared derivation methodology.

Central Technical Problem

Distributed state consistency across rate limiter nodes.

When a rate limiter runs on N stateless application servers, each node has only its local counter. A user sending 10 concurrent requests may distribute across 10 nodes, each seeing count=1 against a limit of 5 — resulting in 10 allowed requests instead of 5.

Three solutions with distinct tradeoffs:

Centralized counter (Redis INCR): Atomic counter in shared Redis. Every request pays a network round-trip to Redis. Redis becomes a single point of failure unless clustered. This is the standard production approach.
Sticky routing: Route each client_id to a dedicated rate limiter node via consistent hashing. Eliminates shared state but breaks on node failure — the client's counter is lost and limits reset. Prevents horizontal scale of the rate limiter itself.
Eventual consistency with relaxed guarantees: Accept slight over-counting (nodes sync periodically). Suitable for soft limits where ~3% over-counting is acceptable. Avoids Redis as a bottleneck at the cost of enforcement accuracy.

Component Design

Algorithm Comparison

Algorithm	How It Works	Memory	Accuracy	Burst Handling
Fixed window counter	Increment counter per window (e.g., 1-minute slots); reset at boundary	Low (1 int per key/window)	Boundary exploit: 2x burst possible straddling window reset	Allows burst at window boundary
Sliding window log	Store timestamp of every request; count requests in last 60s	High (1 entry per request)	Exact — no boundary exploit	Smooth; high memory cost
Sliding window counter	Blend current and previous window counters with time-decay weight	Low (2 ints per key)	~0.003% error (Redis Labs study)	Good approximation of smooth
Token bucket	Tokens accumulate at fixed rate up to capacity; each request consumes 1 token	Low (token count + last_refill_time)	Smooth; burst allowed up to bucket capacity	Natural burst tolerance
Leaky bucket	Fixed-rate queue drain; excess requests queued or dropped	Low	Smooth output; bursty input absorbed into queue	Queue-based absorption

Redis Implementation Patterns

// Fixed window — INCR + EXPIRE (note: INCR and EXPIRE are not atomic together)
key = "rate_limit:{user_id}:{floor(timestamp / window_seconds)}"
count = INCR key
IF count == 1: EXPIRE key window_seconds   // race window: INCR succeeds, EXPIRE may not
IF count > limit: reject request

// Sliding window log — ZADD + ZCARD (sorted set of timestamps)
key = "rate_limit:{user_id}"
now = current_unix_ms
ZADD key now now                                 // add current timestamp
ZREMRANGEBYSCORE key 0 (now - window_ms)         // expire old entries
count = ZCARD key
IF count > limit: reject request

// Atomic INCR+EXPIRE via Lua script (eliminates the fixed-window race condition)
local count = redis.call('INCR', KEYS[1])
if count == 1 then
  redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return count

The Lua script runs as a single atomic operation in Redis — no other command can interleave between INCR and EXPIRE.

Rate Limiter Placement

Placement	Pros	Cons
API Gateway	Single enforcement point; no per-service work; cross-cutting concerns centralised	Gateway becomes a bottleneck; coarse granularity (per-route, not per-operation)
Service-level middleware	Fine-grained per-endpoint limits; no gateway dependency	Duplicated enforcement logic across services; each service manages its own Redis connection pool
Client-side	Zero server load	Unenforceable — clients can ignore or bypass

System Diagram

Rate-Limiter-Design-diagram.excalidraw

Alternatives Considered

Decision	Alternative	Why Chosen Approach Wins
Centralized Redis counter	Local in-memory counters per node	Local counters allow over-counting in distributed deployments; Redis provides atomic enforcement
Lua script for INCR+EXPIRE	Two separate Redis calls	Separate calls have a race window; Lua script is atomic at the Redis server level
Token bucket algorithm	Fixed window counter	Token bucket allows natural bursts up to bucket capacity without boundary exploits; fixed window double-counts at window resets
API Gateway placement	Per-service middleware	Gateway centralises enforcement and removes per-service boilerplate; tradeoff is gateway becoming a bottleneck
Redis Cluster sharding via consistent hashing	Replicated single Redis node	Single node becomes throughput bottleneck at 10M QPS; consistent hashing distributes key space without hotspots

Likely Follow-Up Questions

What happens when Redis goes down? Fail-open (allow all traffic — protect availability) vs fail-closed (block all traffic — protect the backend). The choice is a business decision. Circuit-Breaker-Pattern governs the Redis fallback behaviour.
How do you handle rate limiting across multiple datacenters? Options: global counter via cross-datacenter Redis replication (high latency, strong consistency) vs per-datacenter counters (low latency, slight over-counting globally). Most systems accept per-datacenter enforcement for performance.
How do you support different rate limits for different API tiers? Rule store: a separate service or config layer maps {client_id, endpoint} → {limit, window}. The rate limiter looks up the rule before applying the counter.
How do you avoid Redis as a bottleneck at 10M+ QPS? Shard the key space across a Redis Cluster using Consistent-Hashing — each shard handles a partition of client_id space. Add local in-memory caching for rule lookups (immutable data).
How do you test rate limiting in production? Shadow traffic: replay production traffic at 2x volume against a staging rate limiter. Chaos engineering: kill a Redis shard and verify fail-open behaviour. Synthetic load tests with concurrent clients hitting the same user ID.

Existing Pattern Connections

Design Decision	Existing Pattern	Relationship
HTTP 429 response contract	Operational-API-Patterns	Rate-Limiter-Design owns the algorithm; Operational-API-Patterns owns the HTTP contract (429 status, Retry-After, X-RateLimit headers) — scope boundary
Redis as distributed state store	Distributed-Cache	Rate limiter counters stored in Redis; same Redis cluster may serve application caching; key namespace isolation (`rate_limit:` prefix) required
Redis failure handling	Circuit-Breaker-Pattern	If Redis becomes unavailable, the rate limiter trips a circuit breaker and fails open or closed based on policy
Redis key sharding	Consistent-Hashing	Sharding rate limit keys across Redis Cluster nodes uses consistent hashing to distribute load without hotspots
Rate limiting as gateway concern	API-Gateway-Pattern	Rate limiting is a natural API Gateway cross-cutting concern; centralised enforcement removes per-service overhead

Backlinks

(populated in Phase 32 backlink sweep)

Rate Limiter Design

Tags

Rate Limiter Design

Clarify First

Capacity Estimation

Central Technical Problem

Component Design

Algorithm Comparison

Redis Implementation Patterns

Rate Limiter Placement

System Diagram

Alternatives Considered

Likely Follow-Up Questions

Existing Pattern Connections

Backlinks

Linked mentions