Search Autocomplete Design
Search Autocomplete Design
System that suggests top-K query completions as the user types each character, balancing precomputed prefix indices against data freshness from query log aggregation.
Clarify First
Before designing, lock these assumptions with the interviewer:
- Top-K value? — 5, 10, or 20 suggestions changes index storage and response payload size; affects mobile vs desktop UX tradeoffs.
- Freshness requirement? — Hourly batch aggregation acceptable, or does the system need real-time trending (e.g., breaking news terms appearing in completions within minutes)?
- Multi-language support? — Single locale (ASCII only) vs global Unicode normalization; CJK tokenization requires fundamentally different prefix index design.
- Personalisation? — Global popularity ranking vs per-user history weighting; personalised ranking requires per-user state storage and significantly complicates the read path.
- Scale target? — 1B queries/day (mid-tier) vs 10B queries/day (Google-scale); changes whether a distributed trie or a single-node sorted-set store is required.
Capacity Estimation
Derivation chain for Google-scale autocomplete (2026):
Assumption: 10B search queries/day (Google-scale, 2026 estimate)
query_QPS = 10B / 86,400 ~ 115,700 QPS
autocomplete_requests_per_query ~ 5 (user types 5 chars before selecting)
autocomplete_QPS = 115,700 x 5 ~ 578,500 QPS
Response size:
top-10 completions x 30 bytes each = 300 bytes per response
bandwidth = 578,500 x 300 bytes ~ 174 MB/s outbound
Trie storage:
English vocabulary: ~100K unique words
average prefix length: 5
trie nodes: ~100K x 5 = 500K nodes
per-node storage (top-10 completions + freq): ~500 bytes
total trie: 500K x 500 bytes = 250 MB -- fits in memory of a single server
-> Distributed trie or consistent-hash sharding required only at Google scale
Cross-reference: Capacity-Estimation for the shared DAU-to-QPS-to-storage methodology.
Conclusion: At 578,500 QPS for autocomplete requests, an in-memory cache layer in front of the trie/sorted-set store is mandatory. The 250 MB trie fits on a single server for the English corpus but requires Consistent-Hashing-based prefix sharding at Google scale.
Central Technical Problem
Top-K precomputation pipeline latency vs freshness tradeoff.
Serving prefix completions from a live trie at query time is fast, but updating the trie with real-time query frequency data is expensive. Batching updates reduces overhead but introduces staleness -- top completions may be hours or days old.
Trie vs Redis Sorted Set
Trie data structure:
- Stores prefixes as tree paths; each node stores top-K completions for that prefix.
- O(p) lookup where p = prefix length -- fast read path.
- High memory usage; full trie for a large corpus is hundreds of megabytes to gigabytes.
- Difficult to update incrementally -- typically rebuilt from a batch job. Any partial update risks leaving the trie in an inconsistent state during traversal.
- Standard teaching structure (Alex Xu Vol. 1, Ch. 13) but not the production-standard approach.
Redis sorted set per prefix:
- Each prefix key maps to a sorted set of (completion, frequency) pairs.
- Range query
ZREVRANGE prefix 0 K-1 WITHSCORESreturns top-K by score in O(log N + K). - Incremental update via
ZINCRBY prefix score completion-- no full rebuild required. - High key count (~millions of prefix keys for a large corpus) but operationally simpler than batch trie rebuilds.
Decision: Redis sorted sets are the standard production approach for top-K autocomplete. Trie is the canonical teaching structure but requires periodic full rebuild from batch, making it harder to keep fresh. See Distributed-Cache for the cache-aside layer placed in front of either store.
Precomputation Pipeline
The core tradeoff: every reduction in batch frequency improves freshness but increases load on the aggregation pipeline and trie rebuild time.
[Query Log] --> [Log Aggregator (hourly batch)] --> [Top-K Reducer] --> [Trie / Sorted Set Store]
|
[Autocomplete API] <-- cache layer
- Freshness window = batch frequency. Hourly batch means completions are up to 1 hour stale.
- Daily batch (simpler ops) means completions can be 24 hours stale -- unacceptable for news or trending events.
- Real-time trending (Twitter "trending now") requires streaming aggregation via Kafka + Flink, but adds significant operational complexity and cost.
The aggregation pipeline is a classic Pipes-and-Filters architecture: each stage (log collection, aggregation, top-K reduction, store update) is independently scalable and testable.
Latency Target
Autocomplete must respond in < 100ms; typically < 50ms is required for smooth UX. This mandates:
- An in-memory cache layer (see Distributed-Cache) placed in front of the trie/sorted-set store; TTL aligned with batch frequency.
- Prefix-level sharding across trie servers so no single node becomes a hotspot. Routing via Consistent-Hashing on prefix key.
CQRS Framing
The autocomplete index is a CQRS read model (see CQRS-Pattern): the write path collects and aggregates query logs; the read path queries pre-aggregated projections. The two paths are decoupled -- the aggregation pipeline can be paused or rebuilt without affecting autocomplete availability.
Component Design
[Client] --> [Load Balancer] --> [Autocomplete API]
|
[Cache Layer (prefix -> top-K)]
|
[Trie / Sorted Set Servers]
(sharded by prefix range)
^
[Aggregation Pipeline]
(batch: hourly or daily)
^
[Query Log Store]
Component responsibilities:
- Autocomplete API -- stateless; receives prefix string, checks cache, falls through to trie/sorted-set servers if cache miss; returns top-K list.
- Cache Layer -- in-memory cache (e.g., Redis or Memcached) with prefix as key and top-K list as value; TTL = batch interval; absorbs the majority of 578,500 QPS for popular prefixes.
- Trie / Sorted Set Servers -- store the canonical prefix-to-top-K mapping; sharded by prefix range using Consistent-Hashing; cache miss traffic is a small fraction of total QPS.
- Query Log Store -- append-only log of search queries; input to the aggregation pipeline.
- Aggregation Pipeline -- hourly batch job; aggregates query frequencies, computes top-K per prefix, writes updated completions to the store.
Prefix-range sharding using Database-Sharding ensures that a-f prefixes, g-m prefixes, n-z prefixes, etc. route to distinct shard groups, preventing hotspot concentration on common prefixes like "th-" or "in-".
System Diagram
Search-Autocomplete-Design-diagram.excalidraw
Alternatives Considered
| Decision | Alternative A | Alternative B | Why Chosen Approach Wins |
|---|---|---|---|
| Top-K store: Redis sorted sets | In-memory trie (rebuild from batch) | Disk-backed trie (RocksDB) | Redis sorted sets allow incremental updates via ZINCRBY; trie requires full rebuild from batch; disk-backed trie adds read latency |
| Aggregation: hourly batch | Real-time streaming (Kafka + Flink) | Daily batch | Hourly batch balances freshness (1 hour staleness) vs ops complexity; streaming adds Flink cluster management; daily batch is too stale for trending queries |
| Trie storage: in-memory | Disk-backed (RocksDB) | Distributed across nodes (ZooKeeper-coordinated) | In-memory fits 250 MB English trie on a single server; disk-backed adds I/O latency on prefix lookup; distributed coordination adds failure modes |
Likely Follow-Up Questions
- How do you handle trending queries? -- Real-time trending requires streaming aggregation (Kafka topic → Flink aggregator → sorted set update every 5-60 minutes); adds operational complexity but reduces freshness window from hours to minutes.
- How do you support multi-language autocomplete? -- Each language requires a separate prefix index. Unicode normalization (NFC/NFD) must be applied consistently at both write (log aggregation) and read (prefix key construction) time. CJK languages require character-based rather than word-based prefix decomposition.
- How do you filter offensive or harmful suggestions? -- Maintain a blocklist of disallowed completions; apply as a post-filter step after top-K retrieval; blocklist updates require cache invalidation to take effect immediately.
- How would you add personalised suggestions? -- Blend global top-K score with per-user history score at read time; requires per-user prefix history store; adds latency to the read path; typically only applied to signed-in users.
- How do you handle prefix collisions across languages? -- Namespace prefix keys by locale (e.g.,
en:thevsja:the); route to locale-specific trie shard; CDN geolocation can pre-select locale for anonymous users. - What happens when the aggregation pipeline fails? -- Stale completions remain available (last successful batch); autocomplete degrades gracefully to showing older suggestions rather than returning an error; pipeline failure does not affect read availability.
Existing Pattern Connections
| Design Decision | Existing Pattern | Relationship |
|---|---|---|
| Precomputed top-K completions per prefix | CQRS-Pattern | Autocomplete index is a CQRS read model: write path collects query logs; read path queries pre-aggregated projections |
| Hourly batch aggregation pipeline | Pipes-and-Filters | Log collection -> aggregation -> reduction -> store update is a classic pipes-and-filters pipeline; each stage is independently scalable |
| Consistent hash routing for prefix sharding | Consistent-Hashing | Prefix key space distributed across trie servers using consistent hash ring; no single server is a bottleneck |
| Cache layer in front of trie | Distributed-Cache | Cache-aside on prefix -> top-K mapping; TTL aligned with batch frequency (hourly); reduces trie server load for popular prefixes |
| Trie data structure for prefix lookup | suffix-trie-construction | The O(p) prefix traversal algorithm that powers the autocomplete trie described in this note; see the algorithm note for trie construction, suffix handling, and space optimisation |
Additional cross-links woven into the design: Database-Sharding (prefix range sharding at scale), Capacity-Estimation (shared estimation methodology). Algorithm cross-link: suffix-trie-construction (trie construction and traversal algorithm).