Given 2 strings s
and t
. I need to find for each substring in s
edit distance(Levenshtein distance) to t
. Actually I need to know for each i
position in s
what is the minimum edit distance for all substrings started at position i
.
For example:
t = "ab"
s = "sdabcb"
And I need to get something like:
{2,1,0,2,2}
Explanation:
1st position:
distance("ab", "sd") = 4 ( 2*subst )
distance("ab", "sda") = 3( 2*delete + insert )
distance("ab", "sdab") = 2 ( 2 * delete)
distance("ab", "sdabc") = 3 ( 3 * delete)
distance("ab", "sdabcb") = 4 ( 4 * delete)
So, minimum is 2
2nd position:
distance("ab", "da") = 2 (delete + insert)
distance("ab", "dab") = 1 (delete)
distance("ab", "dabc") = 2 (2*delete)
....
So, minimum is 1
3th position:
distance("ab", "ab") = 0
...
minimum is 0
and so on.
I can use brute force algorithm to solve this task, of course. But is there faster algorithm?
Thanks for help.
The Wagner-Fischer algorithm gives you the answer for all prefixes "for free".
http://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm
The last row of the Wagner-Fischer matrix contains the edit distance from each prefix of s
to t
.
So as a first crack at your problem, for each i
, run Wagner-Fischer and select the smallest element in the last row.
I will be curious to see if anyone else knows (or can find) a better approach.