Search code examples
pythonlisthamming-distance

Starting with Python - Hamming distance of a list of strings


I'm new in Python and I'm trying to obtain the Hamming distance between a pair of DNA sequences. Although I was able to do this, I don't really know how to obtain a list of Hamming distances of more than one pair of DNA sequences. I wonder if anyone could please guide me on this.

dna1 = 'ACCTAT'
dna2 = 'CATTGA'

def distance(strand_a, strand_b):
    if len(strand_a) == len(strand_b):
        i = 0
        n = 0
        while i < len(strand_a):
            if strand_a[i] != strand_b[i]:
                i += 1
                n += 1
            else:
                i += 1
        return(n)
    else:
        raise ValueError("The strings are not the same length")

Output:

The distance is: 5

I wonder if anyone could please help me know which could be the best way to obtain a list of Hamming distances between three pairs of DNA sequences (I tried to do this myself by changing the code above, but I haven't been able to find the solution).

Given these two lists, I want to get the Hamming distance between the 1st, 2nd and 3rd pairs of DNA sequences:

dna1 = ['ACTGG','ATGCA','AACTG']
dna2 = ['ACTGA','ATGGG','ATGAC']

Where the output would be:

distances = [1, 2, 4]

Thank you all for your help!


Solution

  • You can try:

    import numpy as np
    
    dna1 = ['ACTGG','ATGCA','AACTG']
    dna2 = ['ACTGA','ATGGG','ATGAC']
    
    [(np.array(list(x)) != np.array(list(y))).sum() for x, y in zip(dna1, dna2)]
    

    It gives:

    [1, 2, 4]