Search code examples
stringalgorithmdynamic-programmingedit-distance

Given the pairwise edit distance of a and b and b and c, can we find the pairwise edit distance of a and c?


If we have three string a, b, c and we know ( or already calculated ) edit_distance(a,b) and edit_distance(b,c), can we efficiently calculate edit_distance(a,c) without actually comparing a and c.

*edit_distance(a,b) = number of character insertion, deletion and replacement required to convert a into b.*


Solution

  • In general, no. For example, take

    • a = CAP
    • b = CAT
    • c = CAR

    Here, edit_distance(a, b) = 1 and edit_distance(b, c) = 1. Moreover, edit_distance(a, c) = 1.

    However, we could also have

    • a = CAP
    • b = CAT
    • c = RAT

    Here, edit_distance(a, b) = 1 and edit_distance(b, c) = 1, but edit_distance(a, c) = 2. Therefore, there is no way to purely use the edit distances of a and b and of b and c to compute the edit distance of a and c.

    However, we do know that edit_distance(a, c) ≤ edit_distance(a, b) + edit_distance(b, c), since you can always apply the transformations in sequence to turn a into c. More generally, edit distance forms a discrete distance metric, which forms the basis of the BK-tree data structure.

    Hope this helps!