Search code examples
dynamic-programminglevenshtein-distanceedit-distance

Determining a sequence of edits that produces the Levenshtein distance


I am doing some work using Levenshtein (edit) distance using dynamic programming. I think I understand the Wagner-Fischer algorithm to do this efficiently. However, it doesn't look like the algorithm is constructive. If I compute that the edit distance between two strings is, e.g., 10, then I would also like to determine a particular sequence of 10 edits that turns one into the other. Can this be done efficiently too? If so, how?


Solution

  • It is very constructive. With resulting matrix it is possible to find all different sequences of edits that produce minimal distance.

    To find edits you have to pass resulting matrix in 'backward'. Start from result cell, (m,n).

    • If value of cell (m-1, n-1) is same, than characters on these places are same an no edit is needed.
    • If value of cell (m-1, n-1) is smaller, than find cell(s) from {(m-1, n-1), (m-1, n), (m, n-1)} with the smallest value. Position of cell(s) with the smallest value determines if substitution, deletion or insertion is performed. If there are more cells with the smallest value, than more sequences of edits can produce minimal distance. If you need only one sequence than choose any one of them.

    Make same check until path reaches cell (0,0).

    Path of checks determines edits performed in reverse order.