Search code examples
rubyalgorithmlevenshtein-distanceedit-distance

Separately counting the number of deletions in the Levenshtein distance algorithm


So I'm aware that Levenshtein Distance algorithm takes into account the minimum number of deletions, insertions and substitutions required to change a String A into String B. But, I was wondering how you can separately keep track of number of deletions in the total edits required to make the change. I was looking at this implementation of the algorithm,

def levenshtein(first, second)
    first = first.split
    second = second.split
    first_size = first.size
    second_size = second.size
    matrix = [(0..first_size).to_a]
    (1..second_size ).each do |j|
        matrix << [j] + [0] * (first_size)
    end
    count = 0
    (1..second_size).each do |i|
       (1..first_size).each do |j|
         if first[j-1] == second[i-1]
           matrix[i][j] = matrix[i-1][j-1]
         else
           matrix[i][j] = [matrix[i-1][j],matrix[i][j-1], matrix[i-1][j-1]].min + 1
         end
       end
    end
    return matrix.last.last 
end

So in order to keep track of deletions, I tried:

if matrix[i-1[j] == [matrix[i-1][j],matrix[i][j-1], matrix[i-1][j-1]].min

then increase the count. But, this doesn't seem to work. I also tried to get the difference in size for two strings but it fails for the following case

String 1: "my response to prompt#1"
String 2: "my edited response to"

There is clearly 1 deletion here but simply getting the difference in size won't detect so.

I was wondering if anyone knows how to keep track of number of deletions that were involved in the total edits for changing string A into string B.


Solution

  • We can make the deletion count ride along with the number of substitutions by making each entry of the table a list comprised of the two quantities. (As a side effect, the secondary optimization goal is to minimize the number of deletions. I don't know whether this is desirable or not.)

    def levenshtein(first, second)
        first = first.split
        second = second.split
        first_size = first.size
        second_size = second.size
        matrix = [(0..first_size).to_a]
        (1..second_size ).each do |j|
            matrix << [[j,0]] + [[0,0]] * (first_size)
        end
        count = 0
        (1..second_size).each do |i|
           (1..first_size).each do |j|
             if first[j-1] == second[i-1]
               matrix[i][j] = matrix[i-1][j-1]
             else
               matrix[i][j] = [[matrix[i-1][j  ][0]+1, matrix[i-1][j  ][1]  ],
                               [matrix[i  ][j-1][0]+1, matrix[i  ][j-1][1]+1],
                               [matrix[i-1][j-1][0]+1, matrix[i-1][j-1][1]  ]].min
             end
           end
        end
        return matrix.last.last 
    end