I have been learning a Levenshtein distance today, and have stummbled upon what it seems to be a strange way of calculation in a well-known Wagner-Fisher algorithm. Please help me find where I've got it wrong.
First string is Max
.
Second is Annas
.
A matrix of transformation would be the following:
An algorithm reports that the Levenshtein distance from Max to Annas is 4.
So here is how I understand it should work: (given that the cost of any action is 1)
M
vs A
-> Replace m
with a
(1 action so far)
A
vs N
-> Replace a
with n
(2 actions so far)
X
vs N
-> Replace x
with n
(3 actions so far)
After this, we just need to be adding the letters that are left a and s, which take 2 more action from us, resulting in 5 total, not 4.
I see that it is going along a simpler path probably, but what is it? Please explain it's algorithms logic of operation.
Thanks.
You reconstructed the wrong path. It actually replaces 'm' with 'a', inserts 'n', inserts 'n', matches 'a' with 'a', and inserts 's'^H^H^H^H replaces 'x' with 's'. Cost 4.
edit: I'm a little too glib to put it like that. There are several cost-4 paths, but they all involve going from ("m","ann") to ("ma","anna").
edit again: I think the key insight is that not all actions cost the same; matches are free. That's why the cost grid looks like it does: ("ma","anna") costs 3 because you can get there from ("m", "ann"), which costs 3, by spending 0. ("max","annas") costs 4 since you can get there from ("ma","anna"), which costs 3, by spending 1.