HyperLogLog

A probabilistic cardinality estimation algorithm that answers "how many distinct elements are in a stream?" using O(log log n) space — enabling unique-count queries on billions of elements with ~1% error at just 12 KB of memory (Redis implementation).

When NOT to Use

When exact counts are required (financial transactions, compliance reporting) — HyperLogLog has inherent error (~0.81% at 16384 registers). A 0.81% error on 10 million users means an uncertainty of ±81,000 — unacceptable when regulatory audits require precise figures.
When you need to answer membership queries ("is element X in the set?") — use Bloom-Filter instead. HyperLogLog cannot test whether a specific element was seen; it can only estimate how many distinct elements were seen. Bloom Filter solves membership; HyperLogLog solves cardinality.
When the cardinality is small (< 1000 elements) — a simple HashSet is cheaper, exact, and easier to reason about. HyperLogLog's value emerges at millions+ cardinality where storing all elements would require gigabytes of memory.

Core Mechanism

How It Works

Hash function maps each incoming element to a binary string (e.g., hash("user_42") = 00101101...).
Extract register index from the first k bits of the hash → selects which of the 2^k registers (buckets) to update.
Count leading zeros (the rho function, ρ) in the remaining bits of the hash. More leading zeros = rarer event = evidence of larger cardinality.
Update register M[j] = max(M[j], ρ) — each register tracks the maximum leading-zero count seen so far for that bucket.
Estimate cardinality using the harmonic mean across all 2^k registers: the harmonic mean corrects for outlier buckets that happened to see an element with many leading zeros.
Apply bias correction constant α_m to remove systematic bias in the harmonic mean estimator.

Error Rate

Standard error = 1.04 / sqrt(m) where m = number of registers.

Registers (m)	Error rate	Memory (at 6 bits/register)
256	~6.5%	~192 bytes
1024	~3.25%	~768 bytes
4096	~1.63%	~3 KB
16384	~0.81%	~12 KB (Redis default)

Redis uses 16384 registers (12 KB) to achieve ~0.81% standard error regardless of whether the cardinality is 1000 or 1 billion.

Component Diagram

flowchart TD
    A[Input element] --> B[Hash function h]
    B --> C["Binary string e.g. 00101..."]
    C --> D["Extract register index\nfirst k bits → bucket j"]
    C --> E["Count leading zeros\nin remaining bits → ρ"]
    D --> F["Register M_j"]
    E --> F
    F --> G{"ρ > M_j?"}
    G -->|Yes| H["Update M_j = ρ"]
    G -->|No| I[No update]
    H --> J["Harmonic mean of\nall 2^k registers"]
    I --> J
    J --> K["Apply bias correction\nalpha_m constant"]
    K --> L["Cardinality estimate n̂"]

System Design Application

Redis PFADD / PFCOUNT — Redis natively implements HyperLogLog as a first-class data type. PFADD key element adds an element; PFCOUNT key returns the estimated distinct count. The 12 KB maximum footprint is fixed regardless of cardinality.

Common use cases in production systems:

Unique visitor counts: "How many distinct users visited this page today?" — streaming billions of events; storing each user ID would require gigabytes. HyperLogLog counts distinct users at ~12 KB per counter.
Distinct query counts: Log all search queries; count unique queries per hour without storing each query string. Used in search analytics pipelines.
Distinct item views in analytics: "How many unique users viewed product SKU-123 this week?" — a per-product HyperLogLog counter, merged across shards with PFMERGE.

The infrastructure layer where these Redis HyperLogLog counters live is described in Distributed-Cache — Redis is a distributed in-memory cache/store that exposes PFADD/PFCOUNT as native commands.

Scope Boundaries

NOT the same as Bloom-Filter — Bloom Filter answers "is this specific element in the set?" (membership query); HyperLogLog answers "how many distinct elements have been seen in total?" (cardinality estimation). These solve different problems with different space complexity characteristics. A Bloom Filter retains enough information to test individual elements; HyperLogLog discards element identity after hashing and can only estimate a count.
NOT the same as Distributed-Cache — HyperLogLog is a data structure stored IN a Redis instance; Distributed-Cache describes the caching infrastructure layer (cache-aside, write-through, TTL, cache stampede mitigation, Redis vs Memcached selection). HyperLogLog is one of the data types Redis exposes — not a description of the cache infrastructure itself.

Skeletal Snippet

// PRODUCTION NOTE: use Redis PFADD/PFCOUNT via ioredis — this snippet is for illustration only
// import Redis from 'ioredis'; const redis = new Redis(); await redis.pfadd('visitors:2026-03-26', userId);
 
// Conceptual: leading-zeros counting (the rho function)
function countLeadingZeros(n: number, numBits = 32): number {
  if (n === 0) return numBits;
  let count = 0;
  for (let bit = numBits - 1; bit >= 0; bit--) {
    if ((n >>> bit) & 1) break;
    count++;
  }
  return count;
}
 
// Conceptual: register-based cardinality estimator (simplified, no bias correction)
class HyperLogLogSketch {
  private registers: number[];
  private k: number; // number of bits for register index
 
  constructor(k = 10) { // 2^10 = 1024 registers
    this.k = k;
    this.registers = new Array(1 << k).fill(0);
  }
 
  add(hashedValue: number): void {
    const registerIndex = hashedValue >>> (32 - this.k); // first k bits
    const remainingBits = hashedValue << this.k;
    const rho = countLeadingZeros(remainingBits, 32 - this.k) + 1;
    this.registers[registerIndex] = Math.max(this.registers[registerIndex], rho);
  }
 
  estimate(): number {
    const m = this.registers.length;
    // Harmonic mean of 2^register_value across all registers
    const harmonicSum = this.registers.reduce((sum, r) => sum + Math.pow(2, -r), 0);
    const alphaMSquared = 0.7213 / (1 + 1.079 / m) * m * m; // alpha_m * m^2
    return alphaMSquared / harmonicSum;
    // NOTE: production HyperLogLog++ includes small/large cardinality corrections
  }
}

Pitfalls

HyperLogLog++ bias correction for small cardinalities

The basic HyperLogLog algorithm overestimates cardinality for small sets (< 2.5 * m elements) and has precision limits at very large cardinalities approaching 2^32. Redis implements HyperLogLog++ (the extended algorithm by Heule, Nunkesser, and Hall, 2013) which includes small-range and large-range corrections. The snippet above is conceptual — it does NOT include these corrections. Always use Redis PFADD/PFCOUNT in production.

PFMERGE computes union, not intersection

PFMERGE dest key1 key2 merges two HyperLogLog registers into a union estimate (|A ∪ B|). It does NOT compute intersection cardinality (|A ∩ B|). There is no direct HyperLogLog operation for intersection. Approximations require inclusion-exclusion: |A ∩ B| ≈ |A| + |B| - |A ∪ B|, but this accumulates estimation errors and should not be used for small intersections where error dominates the signal.

Existing Pattern Connections

Note	Relationship
Bloom-Filter	Both are probabilistic space-efficient data structures; HyperLogLog counts distinct elements, Bloom Filter tests membership — different problems, different space complexity
Distributed-Cache	Redis stores HyperLogLog as a native data type (PFADD/PFCOUNT); Distributed-Cache describes the infrastructure layer where these operations execute

Backlinks

Algorithms-MOC — listed in System Design Application Index: O(log log n) cardinality estimation via Redis PFADD/PFCOUNT
Distributed-Cache — Redis stores HyperLogLog as a native data type; PFADD/PFCOUNT commands operate on these registers
Bloom-Filter — both are probabilistic space-efficient data structures; Bloom Filter answers membership, HyperLogLog answers cardinality

HyperLogLog

Tags

HyperLogLog

When NOT to Use

Core Mechanism

How It Works

Error Rate

Component Diagram

System Design Application

Scope Boundaries

Skeletal Snippet

Pitfalls

Existing Pattern Connections

Backlinks

Linked mentions