Levenshtein Distance
Levenshtein Distance
Category
Dynamic Programming
Difficulty
Medium
Problem Statement
Given two strings, compute the Levenshtein distance (also known as edit distance) between them. The Levenshtein distance is the minimum number of single-character operations required to transform one string into the other. The allowed operations are:
- Insert a character
- Delete a character
- Replace (substitute) a character
Intuition
This is a classic dynamic programming problem. We build a 2D table where dp[i][j] represents the minimum edit distance between the first i characters of string1 and the first j characters of string2. At each cell, we consider whether the current characters match (free transition from the diagonal) or we need one of the three operations (each corresponding to a different neighboring cell plus one).
The subproblem structure is clear: the solution for the full strings depends on solutions for shorter prefixes, and each cell only depends on three neighbors (left, above, diagonal).
Approach
- Create a 2D table of size (len(str1) + 1) x (len(str2) + 1).
- Initialize the first row as 0, 1, 2, ..., len(str2) (inserting all characters of str2).
- Initialize the first column as 0, 1, 2, ..., len(str1) (deleting all characters of str1).
- Fill the table row by row:
- If
str1[i-1] == str2[j-1], thendp[i][j] = dp[i-1][j-1](no operation needed). - Otherwise,
dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])(delete, insert, or replace).
- If
- Return
dp[len(str1)][len(str2)].
Pseudocode
function levenshteinDistance(str1, str2):
n = length(str1)
m = length(str2)
dp = 2D array of size (n + 1) x (m + 1)
for i from 0 to n:
dp[i][0] = i
for j from 0 to m:
dp[0][j] = j
for i from 1 to n:
for j from 1 to m:
if str1[i - 1] == str2[j - 1]:
dp[i][j] = dp[i - 1][j - 1]
else:
dp[i][j] = 1 + min(dp[i - 1][j], // delete
dp[i][j - 1], // insert
dp[i - 1][j - 1]) // replace
return dp[n][m]
Time & Space Complexity
- Time: O(n * m), where n and m are the lengths of the two strings. Every cell in the table is computed once.
- Space: O(n * m) for the full table. Can be optimized to O(min(n, m)) by only keeping two rows at a time, since each row only depends on the current and previous rows.
Key Insights
- The three operations (insert, delete, replace) map to three directions in the DP table: left, above, and diagonal.
- When characters match, no operation is needed, so the cost carries over from the diagonal.
- The space-optimized version uses only two rows (or even one row with a temporary variable) since each cell depends only on its immediate neighbors.
- Levenshtein distance is a metric: it satisfies non-negativity, identity of indiscernibles, symmetry, and the triangle inequality.
Edge Cases
- One or both strings are empty: the distance is the length of the non-empty string (all insertions or deletions).
- Identical strings: the distance is 0.
- Strings of length 1: the distance is either 0 (same character) or 1 (substitute).
- Strings with no common characters: the distance is max(n, m) at most, but often less due to substitutions.