Search code examples
pythonstringbioinformaticsrosalind

Rosalind translating rna into protein python


Here is my solution to the problem of rosalind project.

def prot(rna):
  for i in xrange(3, (5*len(rna))//4+1, 4):
    rna=rna[:i]+','+rna[i:]
  rnaList=rna.split(',')
  bases=['U','C','A','G']
  codons = [a+b+c for a in bases for b in bases for c in bases]
  amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
  codon_table = dict(zip(codons, amino_acids))
  peptide=[]
  for i in range (len (rnaList)):
    if codon_table[rnaList[i]]=='*':
      break
    peptide+=[codon_table[rnaList[i]]]
  output=''
  for i in peptide:
    output+=str(i)
  return output

If I run prot('AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA'), I get the correct output 'MAMAPRTEINSTRING'. However if the sequence of rna (the input string) is hundreds of nucleotides (characters) long I got an error:

 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "<stdin>", line 11, in prot
 KeyError: 'CUGGAAACGCAGCCGACAUUCGCUGAAGUGUAG'

Can you point me where I went wrong?


Solution

  • Given that you have a KeyError, the problem must be in one of your attempts to access codon_table[rnaList[i]]. You are assuming each item in rnalist is three characters, but evidently, at some point, that stops being True and one of the items is 'CUGGAAACGCAGCCGACAUUCGCUGAAGUGUAG'.

    This happens because when you reassign rna = rna[:i]+','+rna[i:] you change the length of rna, such that your indices i no longer reach the end of the list. This means that for any rna where len(rna) > 60, the last item in the list will not have length 3. If there is a stop codon before you reach the item it isn't a problem, but if you reach it you get the KeyError.

    I suggest you rewrite the start of your function, e.g. using the grouper recipe from itertools:

    from itertools import izip_longest
    
    def grouper(iterable, n, fillvalue=None):
        "Collect data into fixed-length chunks or blocks"
        # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
        args = [iter(iterable)] * n
        return izip_longest(fillvalue=fillvalue, *args)
    
    def prot(rna):
        rnaList = ["".join(t) for t in grouper(rna, 3)]
        ...
    

    Note also that you can use

    peptide.append(codon_table[rnaList[i]])
    

    and

    return "".join(peptide)
    

    to simplify your code.