Search code examples
pythonsortingpython-3.xtranslationsequences

translate my sequence?


I have to write a script to translate this sequence:

dict = {"TTT":"F|Phe","TTC":"F|Phe","TTA":"L|Leu","TTG":"L|Leu","TCT":"S|Ser","TCC":"S|Ser",
              "TCA":"S|Ser","TCG":"S|Ser", "TAT":"Y|Tyr","TAC":"Y|Tyr","TAA":"*|Stp","TAG":"*|Stp",
              "TGT":"C|Cys","TGC":"C|Cys","TGA":"*|Stp","TGG":"W|Trp", "CTT":"L|Leu","CTC":"L|Leu",
              "CTA":"L|Leu","CTG":"L|Leu","CCT":"P|Pro","CCC":"P|Pro","CCA":"P|Pro","CCG":"P|Pro",
              "CAT":"H|His","CAC":"H|His","CAA":"Q|Gln","CAG":"Q|Gln","CGT":"R|Arg","CGC":"R|Arg",
              "CGA":"R|Arg","CGG":"R|Arg", "ATT":"I|Ile","ATC":"I|Ile","ATA":"I|Ile","ATG":"M|Met",
              "ACT":"T|Thr","ACC":"T|Thr","ACA":"T|Thr","ACG":"T|Thr", "AAT":"N|Asn","AAC":"N|Asn",
              "AAA":"K|Lys","AAG":"K|Lys","AGT":"S|Ser","AGC":"S|Ser","AGA":"R|Arg","AGG":"R|Arg",
              "GTT":"V|Val","GTC":"V|Val","GTA":"V|Val","GTG":"V|Val","GCT":"A|Ala","GCC":"A|Ala",
              "GCA":"A|Ala","GCG":"A|Ala", "GAT":"D|Asp","GAC":"D|Asp","GAA":"E|Glu",
              "GAG":"E|Glu","GGT":"G|Gly","GGC":"G|Gly","GGA":"G|Gly","GGG":"G|Gly"}

seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA"
a=""

for y in range( 0, len ( seq)):
    c=(seq[y:y+3])
    #print(c)
    for  k, v in dict.items():
        if seq[y:y+3] == k:
            alle_amino = v[::3] #alle aminozuren op rijtje, a1.1 -a2.1- a.3.1-a1.2 enzo
            print (v)

With this script I get the amino acids from the 3 frames under each other, but how can I sort this and get all the amino acids from frame 1 next to each other, and all the amino acids from frame 2 next to each other, and the same for frame 3?

for example , my results must be :

+3 SerIleLeuAlaStpProLysTrpGluProProTyrValAlaStpProIleTyrIleTyrTle

+2 PheAsnThrSerMetThrLysValGlyThrProLeuArgSerMetThrHisIleTyrIleTyr

+1 PheGlnTyrStpHisAspGlnSerGlyAsnProLeuThrStpHisAspProTyrIleTyrIle

TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA

I use Python 3.

i had one more question : can i make this results by some changes in mine own script ?


Solution

  • You can use (Note this would be ridiculously much more easier using biopython translate method):

    dictio = {your dictionary here}
    
    def translate(seq):
        x = 0
        aaseq = []
        while True:
            try:
                aaseq.append(dicti[seq[x:x+3]])
                x += 3
            except (IndexError, KeyError):
                break
        return aaseq
    
    seq = "TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA"
    
    for frame in range(3):
        print('+%i' %(frame+1), ''.join(item.split('|')[1] for item in translate(seq[frame:])))
    

    Note I changed the name of your dictionary with dicti (not to overwrite dict).


    Some comments to help you understand:

    translate takes you sequence and returns it in the form of a list in which each item corresponds to the amino acid translation of the triplet coding that position. Like:

    aaseq = ["L|Leu","L|Leu","P|Pro", ....]
    

    you could process more this data (get only one or three letters code) inside translate or return it as it is to be processed latter as I have done.

    translate is called in

    ''.join(item.split('|')[1] for item in translate(seq[frame:]))
    

    for each frame. For frame value being 0, 1 or 2 it sends seq[frame:] as a parameter to translate. That is, you are sending the sequences corresponding to the three different reading frames processing them in series. Then, in

       ''.join(item.split('|')[1]
    

    I split the one and three-letters codes for each amino acid and take the one at index 1 (the second). Then they are joined in a single string