Longest Most Frequent Prefix
Longest Most Frequent Prefix
Category
Tries
Difficulty
Medium
Problem Statement
Given an array of strings, find the prefix that appears most frequently across all strings. Among all prefixes with the highest frequency, return the longest one. Every string is considered to have all of its prefixes (e.g., "abc" has prefixes "a", "ab", and "abc"). A prefix must be a proper prefix or the string itself.
Intuition
A trie naturally counts prefix frequencies. When we insert each string into a trie, every node along the path from the root gets its count incremented by 1. After all strings are inserted, the count at each node tells us how many strings share that prefix. We then find the node with the maximum count, and among nodes with equal maximum count, we choose the one at the greatest depth (longest prefix).
Approach
- Build a trie, inserting all strings. At each node, maintain a count of how many strings pass through it.
- Traverse the trie (DFS or BFS) to find the prefix with the highest frequency.
- Among all prefixes with the same highest frequency, select the longest one.
- Reconstruct the prefix string by tracking the path from root to the chosen node.
Pseudocode
function longestMostFrequentPrefix(strings):
trie = new TrieNode()
for string in strings:
node = trie
for char in string:
if char not in node.children:
node.children[char] = new TrieNode()
node = node.children[char]
node.count += 1
// Find the prefix with max frequency, breaking ties by length
bestPrefix = ""
bestCount = 0
function dfs(node, currentPrefix):
if node.count > bestCount or
(node.count == bestCount and length(currentPrefix) > length(bestPrefix)):
bestCount = node.count
bestPrefix = currentPrefix
for char in node.children:
dfs(node.children[char], currentPrefix + char)
for char in trie.children:
dfs(trie.children[char], char)
return bestPrefix
Time & Space Complexity
- Time: O(n * m), where n is the number of strings and m is the average string length. Inserting all strings takes O(n * m), and traversing the trie takes O(total characters), which is also O(n * m).
- Space: O(n * m) for the trie in the worst case (no shared prefixes).
Key Insights
- The trie inherently computes prefix frequencies during insertion: each node's count reflects how many strings share that prefix.
- The frequency of any prefix is at most n (the total number of strings). Shorter prefixes generally have higher frequencies.
- If all strings are identical, the longest most frequent prefix is the string itself.
- The root of the trie has a count equal to n (all strings pass through it), but it represents the empty prefix, which is typically excluded.
Edge Cases
- All strings are identical: the longest most frequent prefix is the full string with frequency n.
- All strings are completely different (no shared characters): each single character has frequency 1; return any one of them.
- Single string in the array: the entire string is the answer.
- Empty strings array: no valid prefix exists.
- Strings that are prefixes of each other (e.g., "a", "ab", "abc"): "a" has the highest frequency.