Search code examples
algorithmstringlevenshtein-distance

How to modify Levenshteins Edit Distance to count "adjacent letter exchanges" as 1 edit


I'm playing around with Levenshteins Edit Distance algorithm, and I want to extend this to count transpositions -- that is, exchanges of adjacent letters -- as 1 edit. The unmodified algorithm counts insertions, deletes or substitutions needed to reach a certain string from another. For instance, the edit distance from "KITTEN" to "SITTING" is 3. Here's the explanation from Wikipedia:

  1. kitten → sitten (substitution of 'k' with 's')
  2. sitten → sittin (substitution of 'e' with 'i')
  3. sittin → sitting (insert 'g' at the end).

Following the same method, the edit distance from "CHIAR" to "CHAIR" is 2:

  1. CHIAR → CHAR (delete 'I')
  2. CHAR → CHAIR (insert 'I')

I would like to count this as "1 edit", since I only exchange two adjacent letters. How would I go about to do this?


Solution

  • You need one more case in the algorithm from Wikipedia:

    if s[i] = t[j] then 
      d[i, j] := d[i-1, j-1]
    else if i > 0 and j > 0 and s[i] = t[j - 1] and s[i - 1] = t[j] then
      d[i, j] := minimum
                 (
                   d[i-2, j-2] + 1 // transpose
                   d[i-1, j] + 1,  // deletion
                   d[i, j-1] + 1,  // insertion
                   d[i-1, j-1] + 1 // substitution
                 )
    else
      d[i, j] := minimum
                 (
                   d[i-1, j] + 1,  // deletion
                   d[i, j-1] + 1,  // insertion
                   d[i-1, j-1] + 1 // substitution
                 )