Fast Evolutionary Algorithm Mutation: Mutating n random genes instead of evaluating every gene

I'm trying to optimize the code for my genetic algorithm. The DNA is currently a dictionary to increase lookup speeds for fitness calculations, but can be easily changed to a numpy array.

The mutation rate is supposed to be 1/L, L being the length of the DNA.

This solution works, but it's quite slow:

# Iterate through genome, flip a gene with a probability of 1/L
def mutate(self):
      self.dna = dict(
        [(i, flip(self.dna[i])) if random.randint(0,num_genes) < 1 
        else (i, self.dna[i]) for i in range(num_genes)]
        )

This solution is about twice as fast, but for some reason it produces much worse results:

# Select n number of genes calculated by 1/L, then change n random genes
def mutate(self):
      num_mutations = sum(np.random.choice([0,1], num_genes, p=[(num_genes-1)/num_genes, 1/num_genes]))
      for i in np.random.choice(num_genes-1, num_mutations):
        self.dna[i] = flip(self.dna[i])

As far as I can tell they mutate the same number of genes, and the output should be the same. 10 runs of both with set random seeds show that the latter approach results in way worse fitness results however.

Why does the second approach result in worse dna fitness? How do these approaches differ in their outcome?

Thank you for any help.

Solution

First off: the vectorization

There's no point in using a dict when your indices are integers - looking up an integer index is always faster than using a hash-table. Also you can vectorize this using numpy - make your self.dna a numpy array instead of a list or a dict, which may make for a 10x-100x speedup.

def flip(x):  # x is a vector of genes, lets a binary array here
    return ~x
mutation_mask = np.random.rand(n_genes)<1./len(self.dna)
self.dna[mutation_mask] = flip(dna[mutation_mask])

Second off: Why your two algorithms are different:

I don't know, they look like they should be the same statistically. The one thing I can think is that in the second you're modifying self.dna in place with self.dna[i]=..., rather than reassigning self.dna=..., so any other area in the code that has a the old self.dna will have their copy changed too in the second case. You could fix this by inserting self.dna = self.dna.copy() before your for-loop in the second example.