Search code examples
levenshtein-distance

Why Levenshtein distance from "abcd" to "badc" is 3?


Starting from "abcd", I want to go to "badc" :

  • I remove "a" -> "bcd";
  • I insert "a" at the right position -> "bacd";
  • I remove "c" -> "bad";
  • I insert "c" at the right position -> "badc".

It's 4 operations. I can't find out a shorter way to do it. However, Levenshtein distance returns me a cost of 3. Why is that?

Thanks for your response.


Solution

  • You can go from abcd to badc in 3 operations. Remember you can do insertions, deletions and substitutions:

    1. Remove a -> bcd.
    2. Substitute c with a -> bad.
    3. Add c at the end -> badc.

    More generally, an insert + remove pair of operations can be coalesced into a substitution if the insertion and the removal are adjacent.