Given this string
dna3 = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCC"
the following code should print the following 4 substrings.
ATGTAA
ATGAATGACTGATAG
ATGCTATGA
ATGTGA
However, it is printing the following:
ATGTAA
ATGAATGACTGATAG
ATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCC
ATGCTTGTATGCTATGAAAATGTGAAATGACCC
ATGCTATGA
ATGAAAATGTGA
ATGTGA
ATGACCC
None
May someone please help me figure this out. Thank you.
def findStopIndex(dna,index):
stop1 = dna.find("tga",index)
if(stop1 == -1 or (stop1-index) % 3 != 0):
stop1 = len(dna)
stop2 = dna.find("taa",index)
if(stop2 == -1 or (stop2-index) % 3 != 0):
stop2 = len(dna)
stop3 = dna.find("tag",index)
if(stop3 == -1 or (stop3-index) % 3 != 0):
stop3 = len(dna)
return min(stop1, min(stop2,stop3))
def printAll(dna):
gene = None
start = 0
while(True):
loc = dna.find("atg", start)
if(loc == -1):break
stop = findStopIndex(dna,loc+3)
gene = dna[loc:stop+3]
print gene.upper()
start = loc + 3
print printAll(dna3.lower())
We may need some additional informations regarding DNA structure. From what you described, it feels like the substrings can't overlap each other. In this case, you need to replace start = loc + 3
by start = stop + 3
(the characters seem to be grouped 3 by 3, also based and what you described).
Finally, you don't need the print
in print printAll(dna3.lower())
, since it shows the None
at the end of your result set (the function doesn't have a return value).
With those modifications, my output is :
ATGTAA
ATGAATGACTGATAG
ATGCTTGTATGCTATGAAAATGTGAAATGACCC
Hope it'll be helpful.