I've just started to learn programming with python. In class we were asked to generate a random DNA sequence, that does NOT contain a specific 6-letter sequence (AACGTT). The point is to make a funtion that always return a legal sequence. Currently my function generates a correct sequence about 78% of the time. How can I make it return a legal sqeuence 100% of the time? Any help is appreciated.
Here is what my code looks like for now:
from random import choice
def generate_seq(length, enzyme):
list_dna = []
nucleotides = ["A", "C", "T", "G"]
i = 0
while i < 1000:
nucleotide = choice(nucleotides)
list_dna.append(nucleotide)
i = i + 1
dna = ''.join(str(nucleotide) for nucleotide in list_dna)
return(dna)
seq = generate_seq(1000, "AACGTT")
if len(seq) == 1000 and seq.count("AACGTT") == 0:
print(seq)
One option is to check your last few entries in your loop and only keep appending if the 'bad' sequence hasn't been created. However, this may result in a higher than true-random chance of having the "AACGT" sequence, just with a different letter instead of the last "T"
from random import choice
def generate_seq(length, enzyme):
list_dna = []
nucleotides = ["A", "C", "T", "G"]
i = 0
while i < 1000:
nucleotide = choice(nucleotides)
list_dna.append(nucleotide)
#check for invalid sequence. If found, remove last element and redraw
if ''.join(list_dna[-6:]) == "AACGTT":
list_dna.pop()
else:
i = i + 1
dna = ''.join(str(nucleotide) for nucleotide in list_dna)
return(dna)
seq = generate_seq(1000, "AACGTT")
if len(seq) == 1000 and seq.count("AACGTT") == 0:
print(seq)