I am trying to find a way to read items in a list of lists, in a group of three, and find a combination of 3 items (codon) to determine the beginning of the fragment and another combination of 3 items to find the end of a fragment (stop codon).
Thus, the reading frame and the list should be read by the program like this:
list 1: XXXXX-start-fragment of interest-stop-XXXXXXX
What I'm trying to do is just to extract the fragment of interest and append it into another list and just get rid of the rest.
This is a more concrete example:
Start codon: ATG
Stop codon: TAG
gene_1= 'ACGGACTATTC'
gene_2= 'GGCCATGAGTAACGCATAGGGCCC
gene_3=GGGCCCATGACGTACTAGGGGCCCATGCATTCATAG
So, the first list does not contain any fragment of interest, whereas the second contains 1 and the third contains 2. I'm trying to get rid of everything outside these reading frames and append these fragments of interest into a list that should look something like this.
frag_int = ['AGTAACGCA', 'ACGTAC', 'CATTCA']
This is what I have so far:
#These are str genelist=[]
gene_1= 'A','C','G','G','A','C','T','A','T','T','C'
gene_2= 'G','G','C','C','A','T','G','A','G','T','A','A','C','G','C','A','T','A','G','G','G','C','C','C'
gene_3='G','G','G','C','C','C','A','T','G','A','C','G','T','A','C','T','A','G','G','G','G','C','C','C','A','T','G','C','A','T','T','C','A','T','A','G'
genelist.append(gene_1)
genelist.append(gene_2)
genelist.append(gene_3)
def transcription(ORF):
mRNA= ''
for i in range(0, len(ORF), 3):
codon= ORF[i:i+3]
if codon != 'ATG':
next(codon)
if codon == 'ATG':
mRNA=codon.transcribe()
if codon == 'TAG':
break
return(mRNA)
mRNAs=[]
for gene in genelist:
for codon in gene:
mRNA= transcription(codon)
mRNAs.append(mRNA)
print(mRNAs)
But it is not really giving anything back, I wonder if the code it's too redundant and I really don't need to define a function here, do you know any better way to do this? Thaaanks!!
Thanks, everyone for your comments, I went to the bioinformatics section and got help from @terdon. This is the most basic way of doing what I described in the problem, however, note that if anyone is trying to find ORFS and transcribe genes, in a program with python there are some biologic rules to take into account and the reading and the stop codons should be considered, however, this is just an example on how to start building the code: Also, note that this code uses biopython
from Bio.Seq import Seq from Bio.Seq import transcribe
genelist=[]
gene_1= 'A','C','G','G','A','C','T','A','T','T','C'
gene_2= 'G','G','C','C','A','T','G','A','G','T','A','A','C','G','C','A','T','A','G','G','G','C','C','C'
gene_3='G','G','G','C','C','C','A','T','G','A','C','G','T','A','C','T','A','G','G','G','G','C','C','C','A','T','G','C','A','T','T','C','A','T','A','G'
genelist.append(gene_1)
genelist.append(gene_2)
genelist.append(gene_3)
def transcription(ORF):
mRNA= ''
foundStart = False
foundEnd = False
for i in range(0, len(ORF), 3):
codon= "".join(ORF[i:i+3])
if codon == 'ATG' and not foundStart:
foundStart = True
if foundStart and not foundEnd:
cc=transcribe(codon)
mRNA = mRNA + transcribe(codon)
if codon == 'TAG':
foundEnd = True
return(mRNA)
mRNAs=[]
for gene in genelist:
mRNA = transcription(gene)
mRNAs.append(mRNA)
print(mRNAs)