Search Autocomplete Design

System that suggests top-K query completions as the user types each character, balancing precomputed prefix indices against data freshness from query log aggregation.

Clarify First

Before designing, lock these assumptions with the interviewer:

Top-K value? — 5, 10, or 20 suggestions changes index storage and response payload size; affects mobile vs desktop UX tradeoffs.
Freshness requirement? — Hourly batch aggregation acceptable, or does the system need real-time trending (e.g., breaking news terms appearing in completions within minutes)?
Multi-language support? — Single locale (ASCII only) vs global Unicode normalization; CJK tokenization requires fundamentally different prefix index design.
Personalisation? — Global popularity ranking vs per-user history weighting; personalised ranking requires per-user state storage and significantly complicates the read path.
Scale target? — 1B queries/day (mid-tier) vs 10B queries/day (Google-scale); changes whether a distributed trie or a single-node sorted-set store is required.

Capacity Estimation

Derivation chain for Google-scale autocomplete (2026):

Assumption: 10B search queries/day (Google-scale, 2026 estimate)
  query_QPS = 10B / 86,400 ~ 115,700 QPS
  autocomplete_requests_per_query ~ 5 (user types 5 chars before selecting)
  autocomplete_QPS = 115,700 x 5 ~ 578,500 QPS

Response size:
  top-10 completions x 30 bytes each = 300 bytes per response
  bandwidth = 578,500 x 300 bytes ~ 174 MB/s outbound

Trie storage:
  English vocabulary: ~100K unique words
  average prefix length: 5
  trie nodes: ~100K x 5 = 500K nodes
  per-node storage (top-10 completions + freq): ~500 bytes
  total trie: 500K x 500 bytes = 250 MB -- fits in memory of a single server
  -> Distributed trie or consistent-hash sharding required only at Google scale

Cross-reference: Capacity-Estimation for the shared DAU-to-QPS-to-storage methodology.

Conclusion: At 578,500 QPS for autocomplete requests, an in-memory cache layer in front of the trie/sorted-set store is mandatory. The 250 MB trie fits on a single server for the English corpus but requires Consistent-Hashing-based prefix sharding at Google scale.

Central Technical Problem

Top-K precomputation pipeline latency vs freshness tradeoff.

Serving prefix completions from a live trie at query time is fast, but updating the trie with real-time query frequency data is expensive. Batching updates reduces overhead but introduces staleness -- top completions may be hours or days old.

Trie vs Redis Sorted Set

Trie data structure:

Stores prefixes as tree paths; each node stores top-K completions for that prefix.
O(p) lookup where p = prefix length -- fast read path.
High memory usage; full trie for a large corpus is hundreds of megabytes to gigabytes.
Difficult to update incrementally -- typically rebuilt from a batch job. Any partial update risks leaving the trie in an inconsistent state during traversal.
Standard teaching structure (Alex Xu Vol. 1, Ch. 13) but not the production-standard approach.

Redis sorted set per prefix:

Each prefix key maps to a sorted set of (completion, frequency) pairs.
Range query ZREVRANGE prefix 0 K-1 WITHSCORES returns top-K by score in O(log N + K).
Incremental update via ZINCRBY prefix score completion -- no full rebuild required.
High key count (~millions of prefix keys for a large corpus) but operationally simpler than batch trie rebuilds.

Decision: Redis sorted sets are the standard production approach for top-K autocomplete. Trie is the canonical teaching structure but requires periodic full rebuild from batch, making it harder to keep fresh. See Distributed-Cache for the cache-aside layer placed in front of either store.

Precomputation Pipeline

The core tradeoff: every reduction in batch frequency improves freshness but increases load on the aggregation pipeline and trie rebuild time.

[Query Log] --> [Log Aggregator (hourly batch)] --> [Top-K Reducer] --> [Trie / Sorted Set Store]
                                                                              |
                                                          [Autocomplete API] <-- cache layer

Freshness window = batch frequency. Hourly batch means completions are up to 1 hour stale.
Daily batch (simpler ops) means completions can be 24 hours stale -- unacceptable for news or trending events.
Real-time trending (Twitter "trending now") requires streaming aggregation via Kafka + Flink, but adds significant operational complexity and cost.

The aggregation pipeline is a classic Pipes-and-Filters architecture: each stage (log collection, aggregation, top-K reduction, store update) is independently scalable and testable.

Latency Target

Autocomplete must respond in < 100ms; typically < 50ms is required for smooth UX. This mandates:

An in-memory cache layer (see Distributed-Cache) placed in front of the trie/sorted-set store; TTL aligned with batch frequency.
Prefix-level sharding across trie servers so no single node becomes a hotspot. Routing via Consistent-Hashing on prefix key.

CQRS Framing

The autocomplete index is a CQRS read model (see CQRS-Pattern): the write path collects and aggregates query logs; the read path queries pre-aggregated projections. The two paths are decoupled -- the aggregation pipeline can be paused or rebuilt without affecting autocomplete availability.

Component Design

[Client] --> [Load Balancer] --> [Autocomplete API]
                                       |
                            [Cache Layer (prefix -> top-K)]
                                       |
                            [Trie / Sorted Set Servers]
                                 (sharded by prefix range)
                                       ^
                            [Aggregation Pipeline]
                             (batch: hourly or daily)
                                       ^
                              [Query Log Store]

Component responsibilities:

Autocomplete API -- stateless; receives prefix string, checks cache, falls through to trie/sorted-set servers if cache miss; returns top-K list.
Cache Layer -- in-memory cache (e.g., Redis or Memcached) with prefix as key and top-K list as value; TTL = batch interval; absorbs the majority of 578,500 QPS for popular prefixes.
Trie / Sorted Set Servers -- store the canonical prefix-to-top-K mapping; sharded by prefix range using Consistent-Hashing; cache miss traffic is a small fraction of total QPS.
Query Log Store -- append-only log of search queries; input to the aggregation pipeline.
Aggregation Pipeline -- hourly batch job; aggregates query frequencies, computes top-K per prefix, writes updated completions to the store.

Prefix-range sharding using Database-Sharding ensures that a-f prefixes, g-m prefixes, n-z prefixes, etc. route to distinct shard groups, preventing hotspot concentration on common prefixes like "th-" or "in-".

System Diagram

Search-Autocomplete-Design-diagram.excalidraw

Alternatives Considered

Decision	Alternative A	Alternative B	Why Chosen Approach Wins
Top-K store: Redis sorted sets	In-memory trie (rebuild from batch)	Disk-backed trie (RocksDB)	Redis sorted sets allow incremental updates via ZINCRBY; trie requires full rebuild from batch; disk-backed trie adds read latency
Aggregation: hourly batch	Real-time streaming (Kafka + Flink)	Daily batch	Hourly batch balances freshness (1 hour staleness) vs ops complexity; streaming adds Flink cluster management; daily batch is too stale for trending queries
Trie storage: in-memory	Disk-backed (RocksDB)	Distributed across nodes (ZooKeeper-coordinated)	In-memory fits 250 MB English trie on a single server; disk-backed adds I/O latency on prefix lookup; distributed coordination adds failure modes

Likely Follow-Up Questions

How do you handle trending queries? -- Real-time trending requires streaming aggregation (Kafka topic → Flink aggregator → sorted set update every 5-60 minutes); adds operational complexity but reduces freshness window from hours to minutes.
How do you support multi-language autocomplete? -- Each language requires a separate prefix index. Unicode normalization (NFC/NFD) must be applied consistently at both write (log aggregation) and read (prefix key construction) time. CJK languages require character-based rather than word-based prefix decomposition.
How do you filter offensive or harmful suggestions? -- Maintain a blocklist of disallowed completions; apply as a post-filter step after top-K retrieval; blocklist updates require cache invalidation to take effect immediately.
How would you add personalised suggestions? -- Blend global top-K score with per-user history score at read time; requires per-user prefix history store; adds latency to the read path; typically only applied to signed-in users.
How do you handle prefix collisions across languages? -- Namespace prefix keys by locale (e.g., en:the vs ja:the); route to locale-specific trie shard; CDN geolocation can pre-select locale for anonymous users.
What happens when the aggregation pipeline fails? -- Stale completions remain available (last successful batch); autocomplete degrades gracefully to showing older suggestions rather than returning an error; pipeline failure does not affect read availability.

Existing Pattern Connections

Design Decision	Existing Pattern	Relationship
Precomputed top-K completions per prefix	CQRS-Pattern	Autocomplete index is a CQRS read model: write path collects query logs; read path queries pre-aggregated projections
Hourly batch aggregation pipeline	Pipes-and-Filters	Log collection -> aggregation -> reduction -> store update is a classic pipes-and-filters pipeline; each stage is independently scalable
Consistent hash routing for prefix sharding	Consistent-Hashing	Prefix key space distributed across trie servers using consistent hash ring; no single server is a bottleneck
Cache layer in front of trie	Distributed-Cache	Cache-aside on prefix -> top-K mapping; TTL aligned with batch frequency (hourly); reduces trie server load for popular prefixes
Trie data structure for prefix lookup	suffix-trie-construction	The O(p) prefix traversal algorithm that powers the autocomplete trie described in this note; see the algorithm note for trie construction, suffix handling, and space optimisation

Additional cross-links woven into the design: Database-Sharding (prefix range sharding at scale), Capacity-Estimation (shared estimation methodology). Algorithm cross-link: suffix-trie-construction (trie construction and traversal algorithm).

Search Autocomplete Design

Search Autocomplete Design

Clarify First

Capacity Estimation

Central Technical Problem

Trie vs Redis Sorted Set

Precomputation Pipeline

Latency Target

CQRS Framing

Component Design

System Diagram

Alternatives Considered

Likely Follow-Up Questions

Existing Pattern Connections

Backlinks

Linked mentions

Search Autocomplete Design

Tags

Search Autocomplete Design

Clarify First

Capacity Estimation

Central Technical Problem

Trie vs Redis Sorted Set

Precomputation Pipeline

Latency Target

CQRS Framing

Component Design

System Diagram

Alternatives Considered

Likely Follow-Up Questions

Existing Pattern Connections

Backlinks

Linked mentions