Search code examples
algorithmlanguage-agnosticdynamic-programming

Proof of correct of the dynamic programming approach to min edit distance


To calculate min edit distance (the minimum amount of insertions, deletions and substitutions required to transform one word to another), a dynamic programming solution is based on the recurrence relation, where the last character of both string is examined. The details are in https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm.

The description of this algorithm is everywhere on the Internet when it comes to edit distance, but all of them just asserts its correctness without proof. By definition of edit distance, you can insert, delete or substitute characters in the middle, not just at the end. Then how do you prove that this recurrence relation actually holds?


Solution

  • Use Induction to Prove Recursive Algorithms Correct

    First, as I said in the comment, you can view dynamic programming as a way to speed up recursion, and the easiest way to prove a recursive algorithm correct is nearly always by induction: Show that it's correct on some small base case(s), and then show that, assuming it is correct for a problem of size n, it is also correct for a problem of size n+1. This way the proof closely mirrors the recursive structure.

    The usual recursion for this problem breaks the problem "Find the minimum cost to edit string A into string B" into the (|A|+1)(|B|+1) subproblems "Find the minimum cost to edit the first i characters of string A into the first j characters of string B", for all 0 <= i <= |A| and 0 <= j <= |B|.

    Choosing Base Cases

    Usually with induction, we can pick a small number of simple base cases (perhaps just one), show that we can easily compute the correct answers for them, and it's obvious how the correctness of all other cases will be implied by the correctness of the base cases, because regardless of what case we start with, there will be just a single "chain" of assumptions that needs to be satisfied, and this chain clearly must end at one of our base cases. However for this particular problem, to show that the algorithm solves the (i, j) subproblem optimally, we need to first assume that it solves the (i-1, j), (i, j-1) and (i-1, j-1) subproblems optimally (since if any of the answers to those subproblems was incorrect, it could easily compute a totally wrong answer for the (i, j) subproblem). This will require a more complicated induction than usual: instead of a single "chain" of assumptions that need to be satisfied, we now have a branching "tree" of assumptions, with (up to) 3 children at each point. We need to pick base cases in such a way that for any (i, j), this entire tree of assumptions eventually "stops", i.e., every path in it eventually hits a base case where its assumptions are satisfied.

    To recap: To prove our solution to (i, j) optimal, we must assume we have optimal solutions for (i-1, j), (i, j-1), and (i-1, j-1); to satisfy that assumption for, say, (i-1, j) (that is, to prove our solution to (i-1, j) is optimal), we need to assume we have optimal solutions for (i-2, j), (i-1, j-1) and (i-2, j-1), etc., etc. How to choose base case(s) that will work? While traversing down this tree, i.e., in going from proving our solution to the subproblem (i, j) correct to proving our solution to any one of its "child" subproblems (i', j') correct, we notice that:

    1. At least one of i' < i or j' < j holds.
    2. We never "skip over" subproblems -- that is, we never have i-i' >= 2, or j-j' >= 2.

    Basically, if we take a single step down this tree, at least one of our two "subproblem co-ordinates" (i or j) decreases, but never by more than 1. What that means is that if we keep descending through the tree, then regardless of which particular "children" subproblems we pick on the way down, we must eventually hit a subproblem with a 0 for (at least) one of its co-ordinates -- i.e. a (i'', 0) subproblem for some 0 <= i'' <= |A| or a (0, j'') subproblem for some 0 <= j'' <= |B|. And what that means is that if we make those subproblems our base cases, we can ensure that every path in the induction tree hits a base case where its assumptions are satisfied and can therefore stop.

    Luckily, these base cases are indeed easy to compute optimal answers to. Consider a problem (i, 0): This problem asks for the minimum cost required to change the first i characters of string A into the first 0 characters of string B. Clearly the best (only!) way to do this is to delete all i characters in the prefix of A, for a cost of i. Likewise the problem (0, j) asks for the minimum cost required to change the first 0 characters of A into the first j characters of B: just as clearly, the best way to do this is to simply insert all j characters in this prefix of B, for a cost of j.

    Inductive Step

    All that remains is the inductive step: Showing that we compute the answer to the (i, j) subproblem correctly, under the assumption that we have computed the answers to the (i-1, j), (i, j-1) and (i-1, j-1) subproblems correctly. The trick is to see that in every possible way of editing the first i characters of A into the first j characters of B, there are actually only 3 possible things that we could do with the last character in each of these prefixes (that is, the i-th character in A and the j-th character in B):

    • Pair A[i] with B[j]. Either they match (cost 0) or not (cost 1), but either way, the total cost of this pairing must be that cost (either 0 or 1), plus the smallest possible cost of editing the rest of the prefix of A (A[1 .. i-1]) into the rest of the prefix of B (B[1 .. j-1]) -- which, by assumption, we have already calculated correctly!
    • Delete A[i]. This costs 1, so the total cost of doing this must be 1 plus the smallest possible cost of editing the rest of the prefix of A (A[1 .. i-1]) into the entire prefix of B (B[1 .. j]) -- which, by assumption, we have already calculated correctly!
    • Insert B[j]. This costs 1, so the total cost of doing this must be 1 plus the smallest possible cost of editing the entire prefix of A (A[1 .. i]) into the rest of the prefix of B (B[1 .. j-1]) -- which, by assumption, we have already calculated correctly!

    Since those 3 things are the only possible things that we could do, and for each of the 3 we have calculated the overall lowest cost of doing that thing, the overall best thing to do must be the best of the 3 of them. This proves that we correctly calculate the lowest cost needed to edit the first i characters of A into the first j characters of B, for any i and j -- so in particular, it's true for i = |A| and j = |B|, that is, for editing the complete string A into the complete string B.