How would I go about displaying detailed distance between words. For example, the output of the program could be:
Words are "car" and "cure":
Replace "a" with "u".
Add "e".
The Levenshtein distance does not fulfill my needs (I think).
Try the following. The algorithm is roughly following Wikipedia (Levenshtein distance). The language used below is ruby
Use as an example, the case of changing s
into t
as follows:
s = 'Sunday'
t = 'Saturday'
First, s
and t
are turned into arrays, and an empty string is inserted at the beginning. m
will eventually be the matrix used in the argorithm.
s = ['', *s.split('')]
t = ['', *t.split('')]
m = Array.new(s.length){[]}
m
here, however, is different from the matrix given if the algorithm in wikipedia for the fact that each cell includes not only the Levenshtein distance, but also the (non-)operation (starting, doing nothing, deletion, insertion, or substitution) that was used to get to that cell from an adjacent (left, up, or upper-left) cell. It may also include a string describing the parameters of the operation. That is, the format of each cell is:
[Levenshtein distance, operation(, string)]
Here is the main routine. It fills in the cells of m
following the algorithm:
s.each_with_index{|a, i| t.each_with_index{|b, j|
m[i][j] =
if i.zero?
[j, "started"]
elsif j.zero?
[i, "started"]
elsif a == b
[m[i-1][j-1][0], "did nothing"]
else
del, ins, subs = m[i-1][j][0], m[i][j-1][0], m[i-1][j-1][0]
case [del, ins, subs].min
when del
[del+1, "deleted", "'#{a}' at position #{i-1}"]
when ins
[ins+1, "inserted", "'#{b}' at position #{j-1}"]
when subs
[subs+1, "substituted", "'#{a}' at position #{i-1} with '#{b}'"]
end
end
}}
Now, we set i
, j
to the bottom-right corner of m
and follow the steps backwards as we unshift the contents of the cell into an array called steps
, until we reach the start.
i, j = s.length-1, t.length-1
steps = []
loop do
case m[i][j][1]
when "started"
break
when "did nothing", "substituted"
steps.unshift(m[i-=1][j-=1])
when "deleted"
steps.unshift(m[i-=1][j])
when "inserted"
steps.unshift(m[i][j-=1])
end
end
Then we print the operation and the string of each step unless that is a non-operation.
steps.each do |d, op, str=''|
puts "#{op} #{str}" unless op == "did nothing" or op == "started"
end
With this particular example, it will output:
inserted 'a' at position 1
inserted 't' at position 2
substituted 'n' at position 2 with 'r'