Search code examples
pythonfor-loopnestedbioinformaticsdna-sequence

Use result of forloop to create new list in python


I have created a mutate_v1 function that generates random mutations in a DNA sequence.

def mutate_v1(sequence, mutation_rate):
    dna_list = list(sequence)
    for i in range(len(sequence)):
        r = random.random()
        if r < mutation_rate:
            mutation_site = random.randint(0, len(dna_list) - 1)
            dna_list[mutation_site] = random.choice(list('ATCG'))
        return ''.join(dna_list)

If I apply my function to all elements of G0 I get a new generation (G1) of mutants (a list of mutated sequences).

G0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']

G1 = [mutate_v1(s,0.01) for s in G0]

#G1
['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']

How can I repeat my function up to G20 (20 generations)?

I can do it manually like the following

G1   = [mutate_v1(s,0.01) for s in G0]
G2   = [mutate_v1(s,0.01) for s in G1]
G3   = [mutate_v1(s,0.01) for s in G2]
G4   = [mutate_v1(s,0.01) for s in G3]
G5   = [mutate_v1(s,0.01) for s in G4]
G6   = [mutate_v1(s,0.01) for s in G5]
G7   = [mutate_v1(s,0.01) for s in G6]

But I'm sure a for loop would be better. I have tested several codes but without results.

Some one can help please?


Solution

  • Use range to iterate up to the number of generations, and store each generation in a list, each generation is the result of mutating the previous one:

    G0 = ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
    
    generations = [G0]
    for _ in range(20):
        previous_generation = generations[-1]
        generations.append([mutate_v1(s, 0.01) for s in previous_generation])
    
    # then you can access by index to a generation
    print(generations[1])  # access generation 1
    print(generations[20]) # access generation 20
    

    Output

    ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAA']
    ['CTGAA', 'CTGAA', 'CTGAA', 'CTGAA', 'CTGAT']