I have to perform a fuzzy matching between a source string and a set of pattern strings. This matching is given by the formula
1 - D(I,P) / max(length(I),length(P))
where
Once I have found P that maximizes this score, I would like to have the mapping between the common parts of I and P
For instance: if I="sunday" and P="saturday", the mapping would be sth like a list of the following pairs:
{{0, 0}, {1, 3}, {3, 5}, {4, 6}, {5, 7}}
as the common characters are 's', 'u', 'd', 'a' and 'y'
In this wikipedia article, one can easily find an implementation to compute the levenshtein distance but it isn't completely clear to me how I could get the mapping from the matrix built during the process it described. Can anyone enlighten me?
thanks
The mapping you give as an example doesn't incorporate the edit distance at all from what I can see, as it is just looking for common characters. Maybe I am misunderstanding you, but you do not need the edit distance matrix for your mapping of common characters; the only time you would look at the edit distance would be during your D(I,P) calculation to determine the highest scoring pattern string. To generate the mapping you gave as an example, it would be a simple matter of iterating through both strings to determine the character indices for identifying the pairs.