Suffix Trie Construction

Suffix Trie Construction

Category

Tries

Difficulty

Medium

Problem Statement

Implement a Suffix Trie class that supports two operations:

  • Construction: Build a suffix trie from a given string. The trie should contain every suffix of the string.
  • contains(string): Check whether a given string is a suffix of the original string.

A suffix trie is a tree-like data structure where each path from the root to a special end marker represents a suffix of the input string.

Intuition

A suffix trie stores all suffixes of a string in a tree structure. Starting from each position in the string, we insert the substring from that position to the end. Each character becomes a node, and a special end symbol (e.g., *) marks the termination of a suffix. To check if a string is a suffix, we simply follow the characters from the root and verify that the path ends with the end symbol.

Approach

Construction

  1. Initialize the root as an empty hash map (dictionary).
  2. For each index i from 0 to the length of the string:
    • Starting from the root, insert each character of the substring string[i:] one by one.
    • At each character, if the character does not exist as a key in the current node, create a new empty hash map for it.
    • Move to the child node.
    • After all characters are inserted, add the end symbol * with a true value.

contains

  1. Starting from the root, follow each character of the query string through the trie.
  2. If any character is not found, return false.
  3. After processing all characters, check if the end symbol * exists at the current node.

Pseudocode

class SuffixTrie:
    endSymbol = "*"
    root = {}

    function constructor(string):
        for i from 0 to length(string) - 1:
            insertSuffix(string, i)

    function insertSuffix(string, startIndex):
        node = root
        for i from startIndex to length(string) - 1:
            char = string[i]
            if char not in node:
                node[char] = {}
            node = node[char]
        node[endSymbol] = true

    function contains(string):
        node = root
        for char in string:
            if char not in node:
                return false
            node = node[char]
        return endSymbol in node

Time & Space Complexity

  • Construction: O(n^2) time and O(n^2) space, where n is the length of the string. There are n suffixes, and their total length is n + (n-1) + ... + 1 = n(n+1)/2.
  • contains: O(m) time where m is the length of the query string. O(1) space.

Key Insights

  • A suffix trie is different from a suffix tree. A suffix trie has one node per character and can have O(n^2) nodes. A suffix tree compresses paths and uses O(n) nodes.
  • The end symbol is crucial for distinguishing between a suffix and a mere substring. Without it, contains("ab") would return true for the string "abc" even though "ab" is not a suffix.
  • Each node is a hash map, making lookups O(1) per character on average.
  • Suffix tries enable fast pattern matching at the cost of quadratic construction time and space.

System Design Application

The Trie data structure is the canonical prefix lookup mechanism in Search-Autocomplete-Design. Given a user typing a prefix, the autocomplete system traverses the trie to find all completions in O(p) time where p is the prefix length. See Search-Autocomplete-Design for the system design context: precomputed top-K completions per prefix node, sharding the trie across nodes via consistent hashing, and cache-aside on prefix lookups.

Edge Cases

  • Empty string: the trie root contains only the end symbol. contains("") returns true.
  • Single-character string: the trie has one path of length 1 plus the end symbol.
  • Query string that is a substring but not a suffix: should return false.
  • Query string longer than the original string: returns false (will fail during traversal).
  • Repeated characters in the string (e.g., "aaa"): suffixes share prefixes, so many trie branches overlap.