I am having trouble modifying the Hamming distance algorithm in order to affect my data in two ways
Add .5 to the Hamming distance if a capital letter is switched for a lower case letter unless it is in the first position.
Examples include: "Killer" and "killer" have a distance of 0 "killer" and "KiLler" have a Hamming distance of .5. "Funny" and FAnny" have a distance of 1.5 (1 for the different letter, additional .5 for the different capitalization).
Making it so that b and d (and their capitalized counterparts) are seen as the same thing
Here is the code i have found that makes up the basic Hamming program
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
if __name__=="__main__":
a = 'mark'
b = 'Make'
print hamming_distance(a, b)
Any suggestions would be welcomed!
Here is a simple solution. For sure it could be optimized for better performance.
Note: I used Python 3, since Python 2 will retire soon.
def hamming_distance(s1, s2):
assert len(s1) == len(s2)
# b and d are interchangeable
s1 = s1.replace('b', 'd').replace('B', 'D')
s2 = s2.replace('b', 'd').replace('B', 'D')
# add 1 for each different character
hammingdist = sum(ch1 != ch2 for ch1, ch2 in zip(s1.lower(), s2.lower()))
# add .5 for each lower/upper case difference (without first letter)
for i in range(1, len(s1)):
hammingdist += 0.5 * (s1[i] >= 'a' and s1[i] <= 'z' and\
s2[i] >= 'A' and s2[i] <= 'Z' or\
s1[i] >= 'A' and s1[i] <= 'Z' and\
s2[i] >= 'a' and s2[i] <= 'z')
return hammingdist
def print_hamming_distance(s1, s2):
print("hamming distance between", s1, "and", s2, "is",
hamming_distance(s1, s2))
if __name__ == "__main__":
assert hamming_distance('mark', 'Make') == 2
assert hamming_distance('Killer', 'killer') == 0
assert hamming_distance('killer', 'KiLler') == 0.5
assert hamming_distance('bole', 'dole') == 0
print("all fine")
print_hamming_distance("organized", "orGanised")
# prints: hamming distance between organized and orGanised is 1.5