Search code examples
pythonrpython-3.xhamming-distance

modify edit distance algorithm between two text strings in python


I am trying to add two modifications to the standard algorithm.

My strings are text and case sensitive.

Say I have a word "cage". The hamming distance between "cage" and "Cage" would be 0 (first letter). Any other letter would be 0.5. (say, "cage" and "cAge".

Two, "cage" and "caKe" would be 1.5 (different letter=1 plus different caps =0.5), Three, "cake" and "caqe" would be 0 (consider k and q to be same letter).

Same rules apply for longs sentences too. (say "Happy Birthday" and "happy BudthDay" distance = 1+1+0.5=2.5)

I would like to pass in any set of words/sentences and modified algorithm instead of standard algorithm needs to be applicable.

I have written a sample code in python for case 1 but unable to understand how to move forward with capitalization.

 def editDistance(str1, str2):  if str1[1]==str2[1]:
            return editDistance(str1,str2)
 print editDistance(str1, str2, len(str1), len(str2))

PS: Any explanation in R would be great too.


Solution

  • check this code out - I have put comments too against it for explanation.

    def editDistance(str1, str2):
        if (str1 == str2): # if both strings equal, print 0
            print 0
        else:
            counter = 0
            for c in range(1, len(str1)-1): # iterate through each character in string
                if (str1[c] == str2[c]): # if characters are equal, don't increment counter
                    counter += 0
                elif (((str1[c].lower()) == str2[c]) or ((str2[c].lower()) == str1[c])):
                    counter += 0.5 # if the lowercase of characters are equal, increment 0.5
                elif ((str1[c].islower()) and (str2[c].islower())):
                    counter += 1 # else if both unequal, both lowercase, increment 1
                elif ((str1[c].isupper()) and (str2[c].isupper())):
                    counter += 1 # else if both unequal, both uppercase, increment 1
                else:
                    counter += 1.5 # reaches here if both unequal and different case, so 1.5
            print counter
    
    editDistance(str1, str2); # call the function with the strings
    

    I am not sure why you were calling the function with the length of strings twice. I have tried this and it works as you expected. Hope this helps!