Search code examples
pythondna-sequence

Count of Same Objects in Two Sequences


I'm having a real hard time trying to learn Python 3 and right now I'm struggling with this one exercise.

I have to write a function that takes two arguments:

1) A string which is a DNA sequence.

2) A string of the same length as argument one (also a DNA sequence)

The function must return a float (the proportion of bases that are the same in two DNA sequences).

So, i know i have to write a function that will return something like this:

seq_similarity("ATGC","AGTT")

should return

0.75

I've only come this far and now I'm stuck even before I started:

def sequence_similarity(seq1,seq2):
    seq1="AGTC"
    seq2="AGTT"

Can you help me get started?


Solution

  • You can use sum and give it a condition:

    sum(x==y for (x,y) in zip(seq1, seq2))
    

    This says 3 for your two strings.

    So then divide by the length:

    sum(x==y for (x,y) in zip(seq1, seq2))/len(seq1)
    

    Watch out for integers if using 2.x:

    sum(x==y for (x,y) in zip(seq1, seq2))/float(len(seq1))