Search Autocomplete Design

Search Autocomplete Design

System that suggests top-K query completions as the user types each character, balancing precomputed prefix indices against data freshness from query log aggregation.

Clarify First

Before designing, lock these assumptions with the interviewer:

  1. Top-K value? — 5, 10, or 20 suggestions changes index storage and response payload size; affects mobile vs desktop UX tradeoffs.
  2. Freshness requirement? — Hourly batch aggregation acceptable, or does the system need real-time trending (e.g., breaking news terms appearing in completions within minutes)?
  3. Multi-language support? — Single locale (ASCII only) vs global Unicode normalization; CJK tokenization requires fundamentally different prefix index design.
  4. Personalisation? — Global popularity ranking vs per-user history weighting; personalised ranking requires per-user state storage and significantly complicates the read path.
  5. Scale target? — 1B queries/day (mid-tier) vs 10B queries/day (Google-scale); changes whether a distributed trie or a single-node sorted-set store is required.

Capacity Estimation

Derivation chain for Google-scale autocomplete (2026):

Assumption: 10B search queries/day (Google-scale, 2026 estimate)
  query_QPS = 10B / 86,400 ~ 115,700 QPS
  autocomplete_requests_per_query ~ 5 (user types 5 chars before selecting)
  autocomplete_QPS = 115,700 x 5 ~ 578,500 QPS

Response size:
  top-10 completions x 30 bytes each = 300 bytes per response
  bandwidth = 578,500 x 300 bytes ~ 174 MB/s outbound

Trie storage:
  English vocabulary: ~100K unique words
  average prefix length: 5
  trie nodes: ~100K x 5 = 500K nodes
  per-node storage (top-10 completions + freq): ~500 bytes
  total trie: 500K x 500 bytes = 250 MB -- fits in memory of a single server
  -> Distributed trie or consistent-hash sharding required only at Google scale

Cross-reference: Capacity-Estimation for the shared DAU-to-QPS-to-storage methodology.

Conclusion: At 578,500 QPS for autocomplete requests, an in-memory cache layer in front of the trie/sorted-set store is mandatory. The 250 MB trie fits on a single server for the English corpus but requires Consistent-Hashing-based prefix sharding at Google scale.

Central Technical Problem

Top-K precomputation pipeline latency vs freshness tradeoff.

Serving prefix completions from a live trie at query time is fast, but updating the trie with real-time query frequency data is expensive. Batching updates reduces overhead but introduces staleness -- top completions may be hours or days old.

Trie vs Redis Sorted Set

Trie data structure:

  • Stores prefixes as tree paths; each node stores top-K completions for that prefix.
  • O(p) lookup where p = prefix length -- fast read path.
  • High memory usage; full trie for a large corpus is hundreds of megabytes to gigabytes.
  • Difficult to update incrementally -- typically rebuilt from a batch job. Any partial update risks leaving the trie in an inconsistent state during traversal.
  • Standard teaching structure (Alex Xu Vol. 1, Ch. 13) but not the production-standard approach.

Redis sorted set per prefix:

  • Each prefix key maps to a sorted set of (completion, frequency) pairs.
  • Range query ZREVRANGE prefix 0 K-1 WITHSCORES returns top-K by score in O(log N + K).
  • Incremental update via ZINCRBY prefix score completion -- no full rebuild required.
  • High key count (~millions of prefix keys for a large corpus) but operationally simpler than batch trie rebuilds.

Decision: Redis sorted sets are the standard production approach for top-K autocomplete. Trie is the canonical teaching structure but requires periodic full rebuild from batch, making it harder to keep fresh. See Distributed-Cache for the cache-aside layer placed in front of either store.

Precomputation Pipeline

The core tradeoff: every reduction in batch frequency improves freshness but increases load on the aggregation pipeline and trie rebuild time.

[Query Log] --> [Log Aggregator (hourly batch)] --> [Top-K Reducer] --> [Trie / Sorted Set Store]
                                                                              |
                                                          [Autocomplete API] <-- cache layer
  • Freshness window = batch frequency. Hourly batch means completions are up to 1 hour stale.
  • Daily batch (simpler ops) means completions can be 24 hours stale -- unacceptable for news or trending events.
  • Real-time trending (Twitter "trending now") requires streaming aggregation via Kafka + Flink, but adds significant operational complexity and cost.

The aggregation pipeline is a classic Pipes-and-Filters architecture: each stage (log collection, aggregation, top-K reduction, store update) is independently scalable and testable.

Latency Target

Autocomplete must respond in < 100ms; typically < 50ms is required for smooth UX. This mandates:

  1. An in-memory cache layer (see Distributed-Cache) placed in front of the trie/sorted-set store; TTL aligned with batch frequency.
  2. Prefix-level sharding across trie servers so no single node becomes a hotspot. Routing via Consistent-Hashing on prefix key.

CQRS Framing

The autocomplete index is a CQRS read model (see CQRS-Pattern): the write path collects and aggregates query logs; the read path queries pre-aggregated projections. The two paths are decoupled -- the aggregation pipeline can be paused or rebuilt without affecting autocomplete availability.

Component Design

[Client] --> [Load Balancer] --> [Autocomplete API]
                                       |
                            [Cache Layer (prefix -> top-K)]
                                       |
                            [Trie / Sorted Set Servers]
                                 (sharded by prefix range)
                                       ^
                            [Aggregation Pipeline]
                             (batch: hourly or daily)
                                       ^
                              [Query Log Store]

Component responsibilities:

  • Autocomplete API -- stateless; receives prefix string, checks cache, falls through to trie/sorted-set servers if cache miss; returns top-K list.
  • Cache Layer -- in-memory cache (e.g., Redis or Memcached) with prefix as key and top-K list as value; TTL = batch interval; absorbs the majority of 578,500 QPS for popular prefixes.
  • Trie / Sorted Set Servers -- store the canonical prefix-to-top-K mapping; sharded by prefix range using Consistent-Hashing; cache miss traffic is a small fraction of total QPS.
  • Query Log Store -- append-only log of search queries; input to the aggregation pipeline.
  • Aggregation Pipeline -- hourly batch job; aggregates query frequencies, computes top-K per prefix, writes updated completions to the store.

Prefix-range sharding using Database-Sharding ensures that a-f prefixes, g-m prefixes, n-z prefixes, etc. route to distinct shard groups, preventing hotspot concentration on common prefixes like "th-" or "in-".

System Diagram

Search-Autocomplete-Design-diagram.excalidraw

Alternatives Considered

DecisionAlternative AAlternative BWhy Chosen Approach Wins
Top-K store: Redis sorted setsIn-memory trie (rebuild from batch)Disk-backed trie (RocksDB)Redis sorted sets allow incremental updates via ZINCRBY; trie requires full rebuild from batch; disk-backed trie adds read latency
Aggregation: hourly batchReal-time streaming (Kafka + Flink)Daily batchHourly batch balances freshness (1 hour staleness) vs ops complexity; streaming adds Flink cluster management; daily batch is too stale for trending queries
Trie storage: in-memoryDisk-backed (RocksDB)Distributed across nodes (ZooKeeper-coordinated)In-memory fits 250 MB English trie on a single server; disk-backed adds I/O latency on prefix lookup; distributed coordination adds failure modes

Likely Follow-Up Questions

  1. How do you handle trending queries? -- Real-time trending requires streaming aggregation (Kafka topic → Flink aggregator → sorted set update every 5-60 minutes); adds operational complexity but reduces freshness window from hours to minutes.
  2. How do you support multi-language autocomplete? -- Each language requires a separate prefix index. Unicode normalization (NFC/NFD) must be applied consistently at both write (log aggregation) and read (prefix key construction) time. CJK languages require character-based rather than word-based prefix decomposition.
  3. How do you filter offensive or harmful suggestions? -- Maintain a blocklist of disallowed completions; apply as a post-filter step after top-K retrieval; blocklist updates require cache invalidation to take effect immediately.
  4. How would you add personalised suggestions? -- Blend global top-K score with per-user history score at read time; requires per-user prefix history store; adds latency to the read path; typically only applied to signed-in users.
  5. How do you handle prefix collisions across languages? -- Namespace prefix keys by locale (e.g., en:the vs ja:the); route to locale-specific trie shard; CDN geolocation can pre-select locale for anonymous users.
  6. What happens when the aggregation pipeline fails? -- Stale completions remain available (last successful batch); autocomplete degrades gracefully to showing older suggestions rather than returning an error; pipeline failure does not affect read availability.

Existing Pattern Connections

Design DecisionExisting PatternRelationship
Precomputed top-K completions per prefixCQRS-PatternAutocomplete index is a CQRS read model: write path collects query logs; read path queries pre-aggregated projections
Hourly batch aggregation pipelinePipes-and-FiltersLog collection -> aggregation -> reduction -> store update is a classic pipes-and-filters pipeline; each stage is independently scalable
Consistent hash routing for prefix shardingConsistent-HashingPrefix key space distributed across trie servers using consistent hash ring; no single server is a bottleneck
Cache layer in front of trieDistributed-CacheCache-aside on prefix -> top-K mapping; TTL aligned with batch frequency (hourly); reduces trie server load for popular prefixes
Trie data structure for prefix lookupsuffix-trie-constructionThe O(p) prefix traversal algorithm that powers the autocomplete trie described in this note; see the algorithm note for trie construction, suffix handling, and space optimisation

Additional cross-links woven into the design: Database-Sharding (prefix range sharding at scale), Capacity-Estimation (shared estimation methodology). Algorithm cross-link: suffix-trie-construction (trie construction and traversal algorithm).