Unique ID Generator Design
Unique ID Generator Design
System that generates globally unique identifiers across distributed nodes without central coordination bottlenecks.
Clarify First
Before designing, lock these assumptions with the interviewer:
- Globally unique or within-cluster? — Cross-datacenter uniqueness requires more coordination (or probabilistic guarantees); within-service unique is easier but limits future scaling.
- K-sortable (time-ordered)? — Snowflake IDs are time-ordered to the millisecond; UUID v4 is random and not sortable; this drives database index efficiency and range query capability.
- Scale? — At 1K IDs/sec, any approach works. At 10K+/sec, DB auto-increment becomes a single-writer bottleneck. At 100K+/sec, Snowflake or UUID v7 with no coordination is required.
- Unpredictable? — Sequential IDs enable enumeration attacks on user-facing resources. Snowflake IDs are partially predictable (time-ordered); UUID v4 is fully random.
- Maximum ID length? — 64-bit integer (Snowflake) fits in a standard
bigintcolumn. 128-bit UUID requiresvarchar(36)or a UUID column type. Some mobile clients and JavaScript environments have issues with 64-bit integers exceedingNumber.MAX_SAFE_INTEGER.
Capacity Estimation
Derivation chain for a high-throughput e-commerce platform (2026):
Assumption: high-throughput e-commerce platform, 2026
Peak order events: 100K orders/second (Black Friday peak; matches
Alibaba 11.11 2023 peak range)
ID generation target: 100K IDs/second sustained, 500K/second burst
Snowflake 64-bit anatomy:
[1 sign bit][41-bit timestamp ms][10-bit machine ID][12-bit sequence]
41 bits timestamp (millisecond precision):
max value = 2^41 - 1 ms ≈ 69 years from epoch
→ covers 2026 + 69 = 2095 before rollover
10 bits machine/datacenter ID:
max value = 2^10 = 1024 worker nodes
12 bits sequence number:
max value = 2^12 = 4096 IDs per millisecond per node
Single node throughput:
4096 IDs/ms × 1000 ms/sec = 4.096M IDs/second per worker node
→ single Snowflake worker exceeds our 500K/sec burst requirement
→ no coordination needed at runtime; each worker generates independently
Cross-reference: Capacity-Estimation for the shared estimation methodology.
Conclusion: One Snowflake worker node handles the full 500K/sec burst. For redundancy, deploy 2-4 workers with distinct machine IDs. Total capacity: 4-16M IDs/sec — 32× headroom over peak requirement.
Central Technical Problem
Clock skew in distributed Snowflake ID generation.
NTP (Network Time Protocol) adjustments on a Snowflake worker node can cause the system clock to move backward by milliseconds. If the current millisecond timestamp is less than the last recorded timestamp, Snowflake has a problem: generating an ID with the same (or earlier) timestamp and the same sequence number would produce a duplicate.
Snowflake must halt generation until the clock catches up:
// Snowflake clock skew guard (pseudocode)
if (currentMs < lastMs) {
// Clock moved backward — wait until caught up
waitUntilMs(lastMs);
// If skew > threshold (e.g., 10 seconds), alert and halt
}Small skew (<1 second): spin-wait (busy loop until currentMs >= lastMs). Acceptable because NTP typically corrects in sub-millisecond increments.
Large skew (>10 seconds): alert and halt. This indicates a deeper time synchronization failure. Continuing to generate IDs under large skew risks duplicate IDs even after the wait. Production Snowflake implementations treat this as a fatal operational condition requiring manual intervention.
This is the primary operational failure mode of Snowflake. Systems using Snowflake IDs must monitor clock skew on every worker node as part of their SLO.
Component Design
Algorithm Comparison
| Approach | Uniqueness Guarantee | Sortable | Coordination Required | Scalability | When NOT to Use |
|---|---|---|---|---|---|
| DB auto-increment | Within-DB unique (single writer) | Yes (monotonic) | Single DB write per ID | Bottleneck at >10K writes/sec | Distributed systems; high write throughput |
| UUID v4 | Probabilistic global unique (122 random bits; negligible collision probability) | No (random) | None | Unlimited (stateless) | When k-sortable IDs required; when 128-bit size is too large for storage or client |
| UUID v7 (RFC 9562, 2024) | Probabilistic global unique | Yes (millisecond timestamp prefix) | None | Unlimited (stateless) | When 64-bit IDs are required; UUID v7 is 128-bit |
| Snowflake (Twitter, 2010) | Guaranteed unique within datacenter/worker assignment | Yes (time-ordered to millisecond) | Worker ID assignment at startup via ZooKeeper/etcd | 4.096M IDs/sec per node | Clock skew >10 seconds (must halt); when 128-bit IDs are acceptable |
| Ticket server (Flickr-style) | Guaranteed unique (centralized counter) | Yes (monotonic) | Centralized ticket DB is always required | Bottleneck at ~10K IDs/sec; single point of failure | High availability requirements; high throughput needs |
Datacenter Awareness
Snowflake's 10-bit machine ID is typically split:
10-bit machine ID = [5 bits: datacenter ID][5 bits: worker ID within datacenter]
= 2^5 datacenters × 2^5 workers = 32 × 32 = 1024 total workers
Worker IDs must be assigned at startup via a coordination service. Two common approaches:
- ZooKeeper / etcd: worker registers at startup, claims a unique worker ID, renews lease; on failure, lease expires and ID can be reclaimed. Adds operational dependency.
- Static configuration: hard-coded per deployment unit (Kubernetes pod annotation or environment variable). Simpler, but requires discipline in deployment tooling to prevent duplicate IDs when deploying new nodes.
No two workers in the same datacenter may share the same worker ID — duplicates cause silent ID collisions with no runtime error.
System Diagram
Unique-ID-Generator-Design-diagram.excalidraw
Alternatives Considered
| Decision | Alternative A | Alternative B | Why Chosen Approach Wins |
|---|---|---|---|
| Snowflake for high-throughput | DB auto-increment | UUID v4 | 4.096M IDs/sec per node; 64-bit; k-sortable for DB index efficiency; no runtime coordination |
| UUID v7 for no-coordination contexts | UUID v4 | Snowflake | RFC 9562 (2024); time-ordered = better DB index locality; no worker ID management overhead; choose when 128-bit size is acceptable |
| ZooKeeper for worker ID assignment | Static config | etcd | ZooKeeper provides lease expiry and automatic reclaim; static config is simpler but operationally fragile on node replacement |
| Ticket server (Flickr) | DB auto-increment | Snowflake | Ticket server trades coordination bottleneck for strict monotonicity; acceptable for low-throughput systems needing guaranteed order |
Recommended defaults:
- High-throughput k-sortable (>10K IDs/sec): Snowflake
- Low-coordination stateless (any throughput, 128-bit acceptable): UUID v7
- Simple single-DB application (<10K writes/sec): DB auto-increment
Likely Follow-Up Questions
- What happens when NTP adjustment exceeds 10 seconds? — Production Snowflake implementations alert and halt ID generation, requiring manual clock correction or node restart. Alternatives: pre-generate a buffer of IDs ahead of time, or fall back to UUID v7 temporarily.
- How do you assign worker IDs without ZooKeeper? — Use static Kubernetes pod annotations or environment variables set at deploy time. Requires a registry (even a simple config map) to ensure no two workers share an ID across the entire fleet.
- Why not UUID v7 instead of Snowflake? — UUID v7 is 128-bit vs Snowflake's 64-bit. A
bigintSnowflake ID fits in 8 bytes and indexes efficiently in B-tree structures. A UUID v7 requires 16 bytes. For databases with billions of records, the index size difference is significant. - How do you handle multi-region deployments? — The 5-bit datacenter ID provides 32 distinct datacenter slots. Each region gets a datacenter ID range. Workers within the region use the worker ID bits. Cross-region uniqueness is guaranteed structurally, not via coordination.
- What if you need more than 4096 IDs per millisecond per node? — Deploy additional Snowflake workers (up to 1024 total). Alternatively, re-partition the bit layout: use fewer machine ID bits and more sequence bits for extreme throughput at the cost of fewer workers.
- How do you prevent ID enumeration for user-facing IDs? — Snowflake IDs are time-ordered and partially guessable. For user-facing IDs where enumeration is a security concern, use UUID v4 (fully random) or apply a reversible obfuscation (e.g., Hashids) to the Snowflake ID at the API boundary.
Existing Pattern Connections
| Design Decision | Existing Pattern | Relationship |
|---|---|---|
| Worker ID coordination at startup | Bounded-Context | Each datacenter/service domain owns its worker ID range; the coordination boundary mirrors bounded context isolation — no cross-context worker ID sharing |
| Ticket server as centralised counter | Singleton-Pattern | Ticket server is a globally unique instance; inherits Singleton's testing pitfall (hard to parallelize) and availability pitfall (single point of failure) |
| UUID v4/v7 as AP choice | CAP-Theorem | UUID generation requires no coordination = partition tolerant; trades globally ordered IDs (consistency) for availability — a deliberate AP tradeoff |
| Snowflake as CP choice | CAP-Theorem | Snowflake halts generation on clock skew rather than risk duplicate IDs — consistency over availability; the explicit CAP tradeoff for sortable guaranteed-unique IDs |