Multi String Search
Multi String Search
Category
Tries
Difficulty
Hard
Problem Statement
Given a "big" string and an array of "small" strings, determine for each small string whether it is contained in the big string. Return an array of booleans corresponding to each small string.
Intuition
A naive approach checks each small string individually using substring search, which can be expensive if there are many small strings. Instead, we build a trie from the small strings and then traverse the big string, checking against the trie from each starting position. Alternatively, we can build a suffix trie from the big string and look up each small string, though this uses more space.
The trie-based approach is efficient because it allows us to search for all small strings simultaneously during a single scan-like process over the big string.
Approach
Using a Trie of Small Strings
- Build a trie from all small strings, marking the end of each word with its index.
- For each starting index
iin the big string: a. Traverse the trie character by character starting frombigString[i]. b. Whenever a word-end marker is encountered, mark that small string as found. c. Stop if the character is not in the trie or the big string is exhausted. - Return the results array.
Pseudocode
function multiStringSearch(bigString, smallStrings):
trie = new Trie()
for i from 0 to length(smallStrings) - 1:
trie.insert(smallStrings[i], i)
results = array of length(smallStrings), all false
for i from 0 to length(bigString) - 1:
searchTrie(bigString, i, trie, results)
return results
function searchTrie(bigString, startIdx, trie, results):
node = trie.root
for i from startIdx to length(bigString) - 1:
char = bigString[i]
if char not in node.children:
break
node = node.children[char]
if node.wordIndices is not empty:
for idx in node.wordIndices:
results[idx] = true
Time & Space Complexity
- Time: O(b * s + n * s), where b is the length of the big string, s is the length of the longest small string, and n is the number of small strings. Building the trie takes O(n * s). Searching takes O(b * s) in the worst case.
- Space: O(n * s) for the trie.
Key Insights
- The trie allows efficient prefix matching against multiple patterns simultaneously.
- Building the trie from small strings (rather than a suffix trie from the big string) is more space-efficient when the total length of small strings is much less than the big string length.
- For very large big strings with few small strings, individual substring searches (e.g., KMP or Rabin-Karp) might be simpler.
- The Aho-Corasick algorithm builds on this approach by adding failure links for even more efficient multi-pattern matching in O(b + n*s + matches).
Edge Cases
- Empty small strings array: return an empty array.
- A small string is empty: it is trivially contained in any string.
- A small string is longer than the big string: it cannot be contained.
- Duplicate small strings: each occurrence in the array should be marked independently.
- Big string is empty: all small strings (except empty ones) return false.
- Small strings that are prefixes of each other: the trie handles this naturally with markers at multiple levels.