Suffix Trie Construction
Suffix Trie Construction
Category
Tries
Difficulty
Medium
Problem Statement
Implement a Suffix Trie class that supports two operations:
- Construction: Build a suffix trie from a given string. The trie should contain every suffix of the string.
- contains(string): Check whether a given string is a suffix of the original string.
A suffix trie is a tree-like data structure where each path from the root to a special end marker represents a suffix of the input string.
Intuition
A suffix trie stores all suffixes of a string in a tree structure. Starting from each position in the string, we insert the substring from that position to the end. Each character becomes a node, and a special end symbol (e.g., *) marks the termination of a suffix. To check if a string is a suffix, we simply follow the characters from the root and verify that the path ends with the end symbol.
Approach
Construction
- Initialize the root as an empty hash map (dictionary).
- For each index
ifrom 0 to the length of the string:- Starting from the root, insert each character of the substring
string[i:]one by one. - At each character, if the character does not exist as a key in the current node, create a new empty hash map for it.
- Move to the child node.
- After all characters are inserted, add the end symbol
*with atruevalue.
- Starting from the root, insert each character of the substring
contains
- Starting from the root, follow each character of the query string through the trie.
- If any character is not found, return false.
- After processing all characters, check if the end symbol
*exists at the current node.
Pseudocode
class SuffixTrie:
endSymbol = "*"
root = {}
function constructor(string):
for i from 0 to length(string) - 1:
insertSuffix(string, i)
function insertSuffix(string, startIndex):
node = root
for i from startIndex to length(string) - 1:
char = string[i]
if char not in node:
node[char] = {}
node = node[char]
node[endSymbol] = true
function contains(string):
node = root
for char in string:
if char not in node:
return false
node = node[char]
return endSymbol in node
Time & Space Complexity
- Construction: O(n^2) time and O(n^2) space, where n is the length of the string. There are n suffixes, and their total length is n + (n-1) + ... + 1 = n(n+1)/2.
- contains: O(m) time where m is the length of the query string. O(1) space.
Key Insights
- A suffix trie is different from a suffix tree. A suffix trie has one node per character and can have O(n^2) nodes. A suffix tree compresses paths and uses O(n) nodes.
- The end symbol is crucial for distinguishing between a suffix and a mere substring. Without it,
contains("ab")would return true for the string "abc" even though "ab" is not a suffix. - Each node is a hash map, making lookups O(1) per character on average.
- Suffix tries enable fast pattern matching at the cost of quadratic construction time and space.
System Design Application
The Trie data structure is the canonical prefix lookup mechanism in Search-Autocomplete-Design. Given a user typing a prefix, the autocomplete system traverses the trie to find all completions in O(p) time where p is the prefix length. See Search-Autocomplete-Design for the system design context: precomputed top-K completions per prefix node, sharding the trie across nodes via consistent hashing, and cache-aside on prefix lookups.
Edge Cases
- Empty string: the trie root contains only the end symbol.
contains("")returns true. - Single-character string: the trie has one path of length 1 plus the end symbol.
- Query string that is a substring but not a suffix: should return false.
- Query string longer than the original string: returns false (will fail during traversal).
- Repeated characters in the string (e.g., "aaa"): suffixes share prefixes, so many trie branches overlap.