Longest String Chain

Longest String Chain

Category

Dynamic Programming

Difficulty

Very Hard

Problem Statement

Given a list of strings, find the length of the longest string chain. A string chain is a sequence of strings where each string is a predecessor of the next. String A is a predecessor of string B if B can be formed by inserting exactly one character at any position in A. For example, "abc" is a predecessor of "abcd" (insert 'd'), and a valid chain might be "a" -> "ab" -> "abc" -> "abcd".

Intuition

If we sort strings by length, then any valid chain must visit strings in non-decreasing order of length, and consecutive chain members differ by exactly one character in length. This lets us build a DP solution: for each string (processed in order of increasing length), we check all possible predecessors (formed by removing one character at a time) and take the best chain length among them. Using a hash map for O(1) lookups makes this efficient.

Approach

  1. Sort the input strings by length.
  2. Create a hash map chainLength where chainLength[s] stores the length of the longest chain ending at string s.
  3. For each string s in sorted order:
    • Initialize chainLength[s] = 1 (the string alone is a chain of length 1).
    • For each index i from 0 to length(s) - 1:
      • Form the predecessor pred by removing the character at index i from s.
      • If pred exists in chainLength:
        • chainLength[s] = max(chainLength[s], chainLength[pred] + 1)
  4. Return the maximum value in chainLength.

Pseudocode

function longestStringChain(words):
    sort words by length

    chainLength = empty hash map
    maxChain = 0

    for word in words:
        chainLength[word] = 1
        for i from 0 to length(word) - 1:
            predecessor = word[0..i-1] + word[i+1..end]
            if predecessor in chainLength:
                chainLength[word] = max(chainLength[word],
                                        chainLength[predecessor] + 1)
        maxChain = max(maxChain, chainLength[word])

    return maxChain

Time & Space Complexity

  • Time: O(n * L^2) where n is the number of words and L is the maximum word length. Sorting is O(n log n). For each word, we generate L predecessors, and forming each predecessor string takes O(L) time.
  • Space: O(n * L) for the hash map storing all words and their chain lengths.

Key Insights

  • Sorting by length guarantees that when we process a word, all its potential predecessors have already been processed.
  • Generating predecessors (removing one character) is more efficient than checking all shorter words for the predecessor relationship, because the number of predecessors is bounded by the word length, not by the total number of words.
  • The problem is essentially finding the longest path in a DAG where edges connect predecessors to successors.
  • Hash map lookups allow O(1) amortized checking of whether a predecessor exists in the word list.

Edge Cases

  • Single word: The longest chain is 1.
  • No word is a predecessor of another: Every chain has length 1.
  • All words have the same length: No predecessor relationships exist, answer is 1.
  • Duplicate words in input: Depending on problem constraints, duplicates may need deduplication. A chain cannot use the same word twice.
  • Words of length 1: These are valid chain starters; they have no predecessors (since removing the only character yields an empty string, which is not typically in the word list).
  • Very long words: The predecessor generation loop scales with word length, so performance depends on L as well as n.