Search code examples
python-3.xfunctionjupyter-notebookbioinformaticsdna-sequence

Creating a function that takes 2 DNA sequences, checks if they're the same length, if they're valid sequences and what type of mutation they have


I am trying to create a function that takes 2 DNA sequences as arguments and checks for 3 things (in this order):

  1. compares them to see if they are both the same length
  2. checks if they are both valid sequences (they must only contain the characters 'ACGT')
  3. Detrmines what type of mutation (transition or transversion) has occured between the two

I'm very new to coding so I'm not sure where my mistake is. I get no output from my code and when I do, it's an error. Any advice?

Here's what I've got so far:

dna_bases = ['ACGT']
ct = ['CT']
ag = ['AG']

def dna_comparison(seq1, seq2):
    if len(seq1) != len(seq2):
        print("Error: The sequences are different lengths.")
    else:
        for i in range(len(seq1)):
            for ii in range(len(seq2)):
                if seq1[i] or seq2[ii] not in dna_bases:
                    return False
                    break
                else:
                    print("Valid sequences. The program will continue")                    
                    transitions = 0
                    transversions = 0 
                    for i in range(len(seq1)):
                        if seq1[i] != seq2[i]:
                            if seq1[i] in ag and seq2[i] in ag:
                                transitions =+ 1
                            elif seq1[i] in ct and seq2[i] in ct:
                                transitions =+ 1
                            else:
                                transversion += 1
                    print (transitions)
                    print (transversions)

Solution

  • IIUC, you can do:

    def check(seq1, seq2):
        if len(seq1) != len(seq2):
            raise ValueError("Sequences not the same length")
    
        if set(seq1).union(seq2).difference("ACGT"):
            raise ValueError("Sequences contain invalid character")
    
        transitions = 0
        transversions = 0
        for s1, s2 in zip(seq1, seq2):
            if s1 == s2:
                continue
            if (s1 in "CT" and s2 in "CT") or (s1 in "AG" and s2 in "AG"):
                transitions += 1
            else:
                transversions += 1
    
        return transitions, transversions
    
    
    seq1 = "ACGTACGTCCACGT"
    seq2 = "TTTACCGACGTACG"
    
    transitions, transversions = check(seq1, seq2)
    print(f"{transitions=} {transversions=}")
    

    Prints:

    transitions=1 transversions=10