Search code examples
hamming-distanceedit-distance

Modification to Hamming distance/ Edit Distance


I am having trouble modifying the Hamming distance algorithm in order to affect my data in two ways

  1. Add .5 to the Hamming distance if a capital letter is switched for a lower case letter unless it is in the first position.
    Examples include: "Killer" and "killer" have a distance of 0 "killer" and "KiLler" have a Hamming distance of .5. "Funny" and FAnny" have a distance of 1.5 (1 for the different letter, additional .5 for the different capitalization).

  2. Making it so that b and d (and their capitalized counterparts) are seen as the same thing

Here is the code i have found that makes up the basic Hamming program

def hamming_distance(s1, s2):
    assert len(s1) == len(s2)
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))

if __name__=="__main__":
    a = 'mark'
    b = 'Make'
    print hamming_distance(a, b) 

Any suggestions would be welcomed!


Solution

  • Here is a simple solution. For sure it could be optimized for better performance.

    Note: I used Python 3, since Python 2 will retire soon.

    def hamming_distance(s1, s2):
        assert len(s1) == len(s2)
        # b and d are interchangeable
        s1 = s1.replace('b', 'd').replace('B', 'D')
        s2 = s2.replace('b', 'd').replace('B', 'D')
        # add 1 for each different character
        hammingdist = sum(ch1 != ch2 for ch1, ch2 in zip(s1.lower(), s2.lower()))
        # add .5 for each lower/upper case difference (without first letter)
        for i in range(1, len(s1)):
            hammingdist += 0.5 * (s1[i] >= 'a' and s1[i] <= 'z' and\
                                  s2[i] >= 'A' and s2[i] <= 'Z' or\
                                  s1[i] >= 'A' and s1[i] <= 'Z' and\
                                  s2[i] >= 'a' and s2[i] <= 'z')
        return hammingdist
    
    def print_hamming_distance(s1, s2):
        print("hamming distance between", s1, "and", s2, "is",
              hamming_distance(s1, s2))
    
    if __name__ == "__main__":
        assert hamming_distance('mark', 'Make') == 2
        assert hamming_distance('Killer', 'killer') == 0
        assert hamming_distance('killer', 'KiLler') == 0.5
        assert hamming_distance('bole', 'dole') == 0
        print("all fine")
        print_hamming_distance("organized", "orGanised")
        # prints: hamming distance between organized and orGanised is 1.5