Consistent Hashing

A hash-based key distribution scheme where adding or removing a node remaps only K/N keys on average (not all keys), using a circular hash ring with virtual nodes for uniform distribution.

When NOT to Use

Node count < 5 and cluster is static: modulo hashing is simpler; consistent hashing overhead is not justified. When the cluster never changes, the key remapping cost of modulo hashing is never paid.
Virtual node overhead at very small scale: V=150 virtual nodes on 3 physical nodes means 450 ring entries with minimal rebalancing benefit. The ring data structure overhead outweighs the distribution improvement at this scale.
Sorted scan requirements: When all keys must be accessed deterministically by position (sorted range scans), consistent hashing requires range queries that cross node boundaries, creating multi-node fan-out where modulo hashing would serve a contiguous range from one node.

Core Mechanism

The Problem

Naive modulo hashing assigns keys to nodes via hash(key) % N. This works perfectly with a fixed number of nodes N. The problem is node changes:

Adding a node: N becomes N+1. Almost every key's hash(key) % N changes — approximately (N/(N+1)) of all keys remap to different nodes.
Removing a node: Same result — a cluster-wide key storm.

Consistent hashing solves this: adding or removing a node remaps only key_count / N keys on average. For a 10-node cluster with 1M keys, a node addition remaps ~100K keys, not ~910K.

Hash Ring Mechanics

Both nodes and keys are mapped to positions on a circular ring (conceptually, positions 0–359 in degrees, or 0–2^32 in 32-bit hash implementations). Mapping uses hash(node_id) for nodes and hash(key) for keys.

Node lookup rule: Starting at the key's hash position, walk clockwise around the ring. The first node encountered owns the key.

      N1 (hash=0)
     /            \
  K5               K1
  |                |
  N4 (hash=270)    N2 (hash=90)
     \            /
      K3    K2
       \   /
        N3 (hash=180)

Key lookup: find key's hash position -> walk clockwise -> first node encountered owns the key

Node lookup trace (0–359 degree model; N1=0, N2=90, N3=180, N4=270):

// Key K with hash=45:
// Walk clockwise from 45 -> first node is N2 at 90
// K is assigned to N2

// Remove N2:
// K with hash=45 walks clockwise from 45 -> next node is N3 at 180
// K remaps to N3 (only keys in range [91, 180] are affected)

// Add N5 at position 135:
// K with hash=45: N2 is at 90 -> still assigned to N2 (unchanged)
// Key K2 with hash=100: was N3 (180) -> now N5 (135) [in range 91-135]

The invariant: only keys whose positions fall in the arc between a removed/added node and its predecessor are affected.

Virtual Nodes

The basic ring problem: With only one ring position per physical node, the key space is unevenly divided — a node at hash=0 gets the arc from the previous node all the way to 0, which can be much larger or smaller than average. Worse, removing a node remaps all its keys to a single successor, creating a hotspot.

Solution: Each physical node maps to V virtual nodes on the ring via hash(node_id + ":" + i) for i in 1..V. Each virtual node is an independent ring entry.

V=100-150 is a common production value (AWS DynamoDB, Apache Cassandra)
Removal redistributes keys to multiple successors — one per virtual node affected — avoiding hotspot concentration

Rationale for V=100-150: V virtual nodes spread a single node's key range across V segments of the ring, ensuring removal redistributes to V successors rather than one. Uniformity benefit increases with V (by the law of large numbers, distribution converges toward equal shares), but ring memory overhead also increases linearly with V. Below V=50, distribution is visibly uneven. Above V=200, ring memory overhead grows with diminishing uniformity improvement. V=100-150 is the Cassandra/DynamoDB empirical sweet spot balancing both concerns.

Rebalancing mechanics:

Operation	What happens	Keys moved
Add node	New virtual nodes inserted at computed positions; keys in range [predecessor → new_virtual_node] move to new node	O(key_count / N)
Remove node	Virtual nodes removed; keys remapped to their respective clockwise successors	O(key_count / N)

Net key movement with consistent hashing: O(key_count / N). Net key movement with modulo hashing: O(key_count).

Component Diagram

Consistent-Hashing-diagram.excalidraw

Skeletal Snippet

// Consistent hash ring -- simplified (teaching example, not production)
// Source: Alex Xu, System Design Interview Vol. 1, Ch. 5
class ConsistentHashRing {
  private ring = new Map<number, string>(); // hash -> node_id
  private sortedHashes: number[] = [];
 
  addNode(nodeId: string, virtualNodes = 150): void {
    for (let i = 0; i < virtualNodes; i++) {
      const hash = this.hash(`${nodeId}:${i}`);
      this.ring.set(hash, nodeId);
      this.sortedHashes.push(hash);
    }
    this.sortedHashes.sort((a, b) => a - b);
  }
 
  getNode(key: string): string {
    const keyHash = this.hash(key);
    const idx = this.sortedHashes.findIndex(h => h >= keyHash);
    const ringIdx = idx === -1 ? 0 : idx; // wrap around
    return this.ring.get(this.sortedHashes[ringIdx])!;
  }
 
  private hash(input: string): number {
    // PRODUCTION NOTE: use MurmurHash or xxHash, not this toy polynomial
    let h = 0;
    for (const c of input) h = (h * 31 + c.charCodeAt(0)) >>> 0;
    return h;
  }
}

Teaching example only

The polynomial hash function above is a toy for illustration. Production implementations use MurmurHash, xxHash, or rely on infrastructure-level consistent hashing (Redis Cluster, Cassandra). Do not hand-roll a consistent hash ring for production — virtual node edge cases (hash collision, uneven distribution) require careful implementation; libraries handle this. See the Don't Hand-Roll guidance in the research notes.

Pitfalls

Virtual node count without rationale

"Use 100-150 virtual nodes" is cargo-culting without the distribution uniformity reasoning. V virtual nodes spread a single node's key range across V ring segments. Removal redistributes to V successors instead of one. Below V=50, distribution is visibly uneven. Above V=200, ring memory overhead grows with diminishing uniformity improvement.

Existing Pattern Connections

[[Load-Balancer]] — consistent hashing as a load-balancing algorithm for session-affinity routing without full connection migration on node changes (Phase 29)
[[Database-Sharding]] — hash-based sharding uses consistent hashing to minimize key remapping during shard splits and joins (Phase 29)
[[Distributed-Cache]] — distributed cache clusters (Memcached, Redis Cluster) use consistent hashing for key routing across cache nodes (Phase 29)
binary-search — the sorted-array lookup that finds the first virtual node on the hash ring whose position is >= the key hash; the ring lookup described in this note is a direct application of binary search on the sorted virtual node positions

Consistent Hashing

Consistent Hashing

When NOT to Use

Core Mechanism

The Problem

Hash Ring Mechanics

Virtual Nodes

Component Diagram

Skeletal Snippet

Pitfalls

Existing Pattern Connections

Backlinks

Linked mentions

Consistent Hashing

Tags

Consistent Hashing

When NOT to Use

Core Mechanism

The Problem

Hash Ring Mechanics

Virtual Nodes

Component Diagram

Skeletal Snippet

Pitfalls

Existing Pattern Connections

Backlinks

Linked mentions