Let's say that we have two examples:
one:
a = "Six d.o.g.s."
b = "six d.o.g.s"
two:
c = "Death Disco"
d = "deathdisco"
e = "deathdisco666"
Both are slightly different. The first has one more dot and the second no space in between on the. Some are lowercase.
Objective:
For the given a
and b
we want a.lower()==b.lower()
to give true
if they have two letters "error".
For the c
and d
to give true since "error" is only one space.
But for the c
and e
, although the e
is two more letters in
length (comparing with c
) we have three letters different.
How can I do this with python? Via regex or is there a library for similar purpose?
So given minitech's comment I write the code I found:
def levenshtein(seq1, seq2):
oneago = None
thisrow = range(1, len(seq2) + 1) + [0]
for x in xrange(len(seq1)):
twoago, oneago, thisrow = oneago, thisrow, [0] * len(seq2) + [x + 1]
for y in xrange(len(seq2)):
delcost = oneago[y] + 1
addcost = thisrow[y - 1] + 1
subcost = oneago[y - 1] + (seq1[x] != seq2[y])
thisrow[y] = min(delcost, addcost, subcost)
return thisrow[len(seq2) - 1]
print levenshtein(a,b) < 2