I would like to compare words that are in two different lists, so for example, I have:
['freeze','dog','difficult','answer'] and another list
['freaze','dot','dificult','anser']. I want to compare the words in this list and give marks for incorrect letters. So, +1 for being correct, and -1 for one letter wrong. To give some context, in a spelling test, the first list would be answers, and the second list would be the student's answers. How would i go about doing this?
Assuming the two lists are the same length and you have some function grade(a,b)
where a,b
are strings:
key = ['freeze','dog','difficult','answer']
ans = ['freaze','dot','dificult','anser']
pairs = zip(key, ans)
score = sum(grade(k,v) for (k,v) in pairs)
A possible grading function would be:
def grade(a,b):
return 1 if a == b else -1
A grading function that punishes each wrong character and gives 1pt for a correct spelling (that sounds harsh...) might be:
def grade(a,b):
score = sum(a == b for (a,b) in zip(a,b)) - max(len(a), len(b))
return score if score else 1
If you want the Levenshtein distance, you would probably want your grade
function to be a wrapper around the following, which was found on Wikibooks and appears to be reasonably efficient:
def levenshtein(seq1, seq2):
oneago = None
thisrow = range(1, len(seq2) + 1) + [0]
for x in xrange(len(seq1)):
twoago, oneago, thisrow = oneago, thisrow, [0] * len(seq2) + [x + 1]
for y in xrange(len(seq2)):
delcost = oneago[y] + 1
addcost = thisrow[y - 1] + 1
subcost = oneago[y - 1] + (seq1[x] != seq2[y])
thisrow[y] = min(delcost, addcost, subcost)
return thisrow[len(seq2) - 1]
You could also take a look at difflib
to do more complicated stuff.