According to wikipedia there's a possible modification to the Wagner-Fischer-algorithm that can calculate if the Levenshtein distance of two words is lower than a certain threshold, that is much quicker than the original one if that's all you want to know.
"By examining diagonals instead of rows, and by using lazy evaluation, we can find the Levenshtein distance in O(m (1 + d)) time (where d is the Levenshtein distance), which is much faster than the regular dynamic programming algorithm if the distance is small."
How does this solution work? I'm having a real hard time visualizing it since it feels like the value of any matrix cell depends on the values of the cells above, to the left and diagonally up to the left of it, so I'm not sure how to traverse the matrix using only a diagonal strip.
Second attempt at explanation:
Suppose that we're finding the distance between a length-m word and a length-n word. Let the matrix entries be indexed by [0, m] × [0, n], where the (i, j) entry represents the edit distance between the length-i prefix of the length-m word and the length-j prefix of the length-n word.
We can view the dynamic program as finding the shortest path from (0, 0) to (m, n) in a directed graph whose vertices correspond to matrix entries, with length-1 rightward arcs and length-1 downward arcs and length-0 or length-1 diagonal arcs depending on whether the characters at i and j match. The idea, in a nutshell, is to use A* with the length difference heuristic H(i, j) = |(m - i) - (n - j)|. Then, we don't expand nodes whose A* value is greater than d. The result is that only some of the diagonals need to be opened:
o t h e r w o r d
t * * *
h * * *
e * * *
w * * *
o * * *
r * * *
d * * *
First attempt at explanation:
Each matrix entry (i, j) is lowerbounded by |i - j|, because that's a lowerbound on the number of non-diagonal moves needed to reach that state. The strip is every element whose coordinates (i, j) satisfies |i - j| ≤ d, which looks like
o t h e r w o r d
t * * *
h * * * *
e * * * * *
w * * * * *
o * * * * *
r * * * * *
d * * * * *
for d = 2. When the values for the blank elements off of the strip are needed, just use d. At the end, any strip entry that's ≤ d is valid, because the blank elements can only contribute d + 1, seeing as the upper left neighbor of a strip element is also on the strip.
For words of different lengths, we can actually apply the argument to the transpose and restrict to the strip like
o t h e r w o r d
t * * *
h * * *
e * * *
w * * *
o * * *
r * * *
d * * *
though the asymptotic running time is the same.