Search code examples
pythondatabasekeyerror

KeyError: 'mtD' when 'mtD' is nowhere to be found in the relevant code


I'm using a simple function to convert a DNA sequence into an amino acid sequence. At a high level, the code seems pretty fine, but whenever I run the program, I get the error KeyError: 'mtD', with the source of this error apparently being at line 26 (if table[seq[i:i+3]] == "_" :). The only other time is 'mtD' is mentioned in my program is when I'm simply printing out my datasets to the console, which makes the problem even more puzzling. My code is shown below.

#Creating the protein sequence column for the data
Protein_Sequence = []

#dna to protein sequence function
def translate11(seq): 
  table = {"TTT" : "F", "CTT" : "L", "ATT" : "I", "GTT" : "V",
           "TTC" : "F", "CTC" : "L", "ATC" : "I", "GTC" : "V",
           "TTA" : "L", "CTA" : "L", "ATA" : "I", "GTA" : "V",
           "TTG" : "L", "CTG" : "L", "ATG" : "M", "GTG" : "V",
           "TCT" : "S", "CCT" : "P", "ACT" : "T", "GCT" : "A",
           "TCC" : "S", "CCC" : "P", "ACC" : "T", "GCC" : "A",
           "TCA" : "S", "CCA" : "P", "ACA" : "T", "GCA" : "A",
           "TCG" : "S", "CCG" : "P", "ACG" : "T", "GCG" : "A",
           "TAT" : "Y", "CAT" : "H", "AAT" : "N", "GAT" : "D",
           "TAC" : "Y", "CAC" : "H", "AAC" : "N", "GAC" : "D",
           "TAA" : "_", "CAA" : "Q", "AAA" : "K", "GAA" : "E",
           "TAG" : "_", "CAG" : "Q", "AAG" : "K", "GAG" : "E",
           "TGT" : "C", "CGT" : "R", "AGT" : "S", "GGT" : "G",
           "TGC" : "C", "CGC" : "R", "AGC" : "S", "GGC" : "G",
           "TGA" : "_", "CGA" : "R", "AGA" : "R", "GGA" : "G",
           "TGG" : "W", "CGG" : "R", "AGG" : "R", "GGG" : "G" 
           }
  pro_sequence =" "

  for i in range(0, len(seq)-(3+len(seq)%3), 3):
    if table[seq[i:i+3]] == "_" :
        break
    pro_sequence += table[seq[i:i+3]]

     
  return pro_sequence

newthang = df.mtDNA_Sequence
for thang in newthang:
  x = translate11(thang)
  Protein_Sequence.append(x)

Solution

  • Your function worked for me, I tried it with a short nucleotide sequence and it gave the appropriate translation

    The for loop ended one amino acid short, so you could remove the 3+ :

    for i in range(0, len(seq)-(len(seq)%3), 3):
    

    And when you declare pro_sequence, start with an empty string "" instead of a space character " "

    So after these tiny changes, I tried the following :

    sequence = "tactgtggctactcagctgtgcgcatggcccgcctgctgtcaccaggggcgaggctcatcaccatcgagatcaaccccgactgtgccgccatcacccagcggatggtggatttcgctggcatgaaggacaag"
    print translate11(sequence.upper())
    
    # YCGYSAVRMARLLSPGARLITIEINPDCAAITQRMVDFAGMKDK
    

    That is the correct translation

    So one of the inputs you are giving your function (from df.mtDNA_Sequence) must start with or contain the letters "mtD" rather than just a string of nucleotides

    Try adding another condition, that breaks out of the for loop if the characters aren't a recognized codon

    for i in range(0, len(seq)-(len(seq)%3), 3):
      if seq[i:i+3] not in table.keys() :
        break
      if table[seq[i:i+3]] == "_" :
        break