Search code examples
algorithmlevenshtein-distance

Why Levenstein distance is calculated without the last insertion


I have been learning a Levenshtein distance today, and have stummbled upon what it seems to be a strange way of calculation in a well-known Wagner-Fisher algorithm. Please help me find where I've got it wrong.

First string is Max.

Second is Annas.

A matrix of transformation would be the following:

Levenshtein

An algorithm reports that the Levenshtein distance from Max to Annas is 4.

So here is how I understand it should work: (given that the cost of any action is 1)

M vs A -> Replace m with a (1 action so far)

A vs N -> Replace a with n (2 actions so far)

X vs N -> Replace x with n (3 actions so far)

After this, we just need to be adding the letters that are left a and s, which take 2 more action from us, resulting in 5 total, not 4.

I see that it is going along a simpler path probably, but what is it? Please explain it's algorithms logic of operation.

Thanks.


Solution

  • You reconstructed the wrong path. It actually replaces 'm' with 'a', inserts 'n', inserts 'n', matches 'a' with 'a', and inserts 's'^H^H^H^H replaces 'x' with 's'. Cost 4.

    edit: I'm a little too glib to put it like that. There are several cost-4 paths, but they all involve going from ("m","ann") to ("ma","anna").

    edit again: I think the key insight is that not all actions cost the same; matches are free. That's why the cost grid looks like it does: ("ma","anna") costs 3 because you can get there from ("m", "ann"), which costs 3, by spending 0. ("max","annas") costs 4 since you can get there from ("ma","anna"), which costs 3, by spending 1.