artificial-intelligence genetic-algorithm evolutionary-algorithm genetics

What is genetic drift and how does it affect EAs?

I have read in some articles on evolutionary computing that the algorithms generally converge to a single solution due to the phenomenon of genetic drift. There is a lot of content on the Internet, but I can't get a deep understanding of this concept. I need to know, simply and precisely:

What is genetic drift in the context of evolutionary computing?
How does it affect the convergence of an evolutionary algorithm?

Solution

To better understand the original concept of genetic drift (Biology), I suggest you read this Khan Academy's article. Simply put, you can think of it as an evolutionary phenomenon in which the frequency of one or more alleles (versions of a gene) in a population changes due to random factors (unrelated to the fitness of each individual). If the fittest individual of a population is struck, out of bad luck, by a lightning and dies before reproducing, he won't leave offspring (although he has the highest fitness!). This is an example (somewhat absurd, I know) of genetic drift.

Now, in the specific context of evolutionary algorithms, this paper provides a good summary on the subject:

EAs genetic drift can be as a result of a combination of factors, primarily related to selection, fitness function and representation. It happens by unintentional loss of genotypes. For example, random chance that a good genotype solution never gets selected for reproduction. Or, if there is a ‘lifespan’ to a solution and it dies before it can reproduce. Normally such a genotype only resides in the population for a limited number of generations.

(Sloss & Gustafson, 2019)

Finally, I will give you a real example of genetic drift acting on a genetic algorithm. Recently, I've used a simple neuroevolution algorithm to create an agent capable of playing the Snake game (GitHub repo). In my implementation of the game, the apples appear in random positions of the screen. When executing the evolutionary process for the first time, I noticed a big fluctuation in the population's best fitness between consecutive generations - overall, it wasn't improving much. Because of this, my algorithm was unable to converge to a good solution.

After some debugging, I found out that this was being caused by genetic drift. Because the apples spawned in random positions, some individuals, not necessarily the fittest, were lucky and got "easy apples", thus achieving a high fitness and leaving more offspring. Do you see the problem here?

Suppose that snake A is better at the game than snake B, because it can move towards the food, while B only moves randomly. Now, suppose that the first food that appeared for snake A was in a corner of the screen (a difficult position) and A died shortly after eating the apple. Now, suppose that snake B was lucky enough to have 3 apples spawn in a row, one after the other. Although B is "dumber" than A, it will leave more offspring, because it achieved a greater fitness. B's offspring will "pollute" the next generation, because they'll probably be "dumb" like B.

I solved the problem using a better apple positioning algorithm (I defined a minimum distance between the spawning position of two consecutive apples) and by calculating each individual's final fitness as the average of its fitness in several playing sessions. This greatly reduced (although it did not eliminate) the interference of genetic drift in my algorithm.

I hope this helps. You can also take a look at this video (it's in Portuguese, but English subtitles are available), where I explain some of the strategies I used to make the Snake AI.