Longest Common Subsequence
Longest Common Subsequence
Category
Dynamic Programming
Difficulty
Hard
Problem Statement
Given two strings, find the longest common subsequence (LCS). A subsequence is a sequence of characters that appears in the same relative order in both strings, but not necessarily contiguously. Return the LCS itself (not just its length).
Intuition
If the last characters of both strings match, that character is part of the LCS, and we recurse on both strings with the last character removed. If they do not match, the LCS is the longer of the two results obtained by removing the last character of one string or the other. This overlapping subproblem structure makes dynamic programming ideal.
We build a 2D table where dp[i][j] represents the length of the LCS of the first i characters of string1 and the first j characters of string2. After filling the table, we backtrack to reconstruct the actual subsequence.
Approach
- Create a 2D table
dpof size (n+1) x (m+1), initialized to 0. - Fill the table:
- If
str1[i-1] == str2[j-1]:dp[i][j] = dp[i-1][j-1] + 1. - Else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
- If
- Backtrack from
dp[n][m]to reconstruct the LCS:- If characters match, include the character and move diagonally.
- Otherwise, move in the direction of the larger value.
Pseudocode
function longestCommonSubsequence(str1, str2):
n = length(str1)
m = length(str2)
dp = 2D array of (n + 1) x (m + 1), initialized to 0
for i from 1 to n:
for j from 1 to m:
if str1[i - 1] == str2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
// Backtrack to find the LCS
lcs = []
i = n, j = m
while i > 0 and j > 0:
if str1[i - 1] == str2[j - 1]:
lcs.prepend(str1[i - 1])
i -= 1
j -= 1
else if dp[i - 1][j] > dp[i][j - 1]:
i -= 1
else:
j -= 1
return lcs
Time & Space Complexity
- Time: O(n * m), where n and m are the lengths of the two strings. Every cell is computed once, and backtracking takes O(n + m).
- Space: O(n * m) for the DP table. If only the length is needed (not the actual subsequence), space can be reduced to O(min(n, m)) using two rows.
Key Insights
- LCS is fundamentally different from longest common substring: subsequences do not require contiguity.
- The backtracking step is essential for recovering the actual subsequence from the DP table.
- There may be multiple LCS of the same length; the backtracking procedure returns one of them.
- LCS has applications in diff tools, DNA sequence alignment, and version control systems.
- The problem is related to edit distance: edit distance = n + m - 2 * LCS_length (when only insertions and deletions are allowed).
Edge Cases
- One or both strings are empty: the LCS is empty.
- Identical strings: the LCS is the string itself.
- No common characters: the LCS is empty.
- One string is a subsequence of the other: the LCS is the shorter string.
- Strings of length 1: the LCS is that character if it appears in both, otherwise empty.