I am studying the Stanford NLP course by the following slides:https://web.stanford.edu/class/cs124/lec/med.pdf. The definition of edit Distance algorithm in this slide as following:
Initialization
D(i,0) = i
D(0,j) = j
Recurrence Relation:
For each i = 1…M
For each j = 1…N
D(i,j)= min {D(i-1,j) + 1, D(i,j-1) + 1,
D(i-1,j-1) + 2(if X(i) ≠ Y(j) )
0(if X(i) = Y(j))}
why D(i-1,j-1) + 2 not (+1), if X(i) ≠ Y(j). I found the definition of edit Distance algorithm in the wikipedia is '+1':https://en.wikipedia.org/wiki/Levenshtein_distance. Could you guys explain the difference of these two definition, please. I am a new guy to NLP. Thanks!
When editing one string, what is the minimal number of changes you need to do in order to get another string?
This is a general, not specific, definition for editing distance. To get a precise definition, you need to define what a "change" is.
Example: how many changes do you need to turn feh
into fah
?
e
by a
e
; then add a
in the same placeBoth answers are useful, and lead to two slightly different definitions of editing distance.