This piece of code returns the Levenshtein edit distance of 2 terms. How can i make this so that insertion and deletion only costs 0.5 instead of 1 ? substitution should still costs 1.
def substCost(x,y):
if x == y:
return 0
else:
return 1
def levenshtein(target, source):
i = len(target); j = len(source)
if i == 0:
return j
elif j == 0:
return i
return(min(levenshtein(target[:i-1],source)+1,
levenshtein(target, source[:j-1])+1,
levenshtein(target[:i-1], source[:j-1])+substCost(source[j-1],target[i-1])))
There are two places you need to account for the reduced cost of adding or removing a vowel. They are the return j
and return i
lines in the base cases of your function, and the +1
s in the min
call after the first two recursive calls.
We need to change each of those to use a "ternary" expression: 0.5 if ch in 'aeiou' else 1
in place of assuming a cost of 1
per character added or removed.
For the base cases, we can replace the return values with sum
calls on a generator expression that includes the ternary expression:
if i == 0:
return sum(0.5 if ch in 'aeiou' else 1 for ch in source)
elif j == 0:
return sum(0.5 if ch in 'aeiou' else 1 for ch in target)
For later cases, we can replace the +1
with the ternary expression itself (with an index rather than the ch
iteration variable):
return min(levenshtein(target[:i-1],source) + (0.5 if target[-1] in 'aeiou' else 1),
levenshtein(target, source[:j-1]) + (0.5 if source[-1] in 'aeiou' else 1),
levenshtein(target[:i-1], source[:j-1])+substCost(source[j-1],target[i-1]))
If you wanted to generalize this, you could move the ternary expression out into its own function, named something like addCost
, and call it from the code in the levenshtein
function.