genetic-algorithm evolutionary-algorithm differential-evolution

Can Differential Evolution solve problems that require mutation of dependent parameters?

One of the differences between Differential Evolution (DE) and Genetic Algorithms (GA) is that DE discards a new candidate unless it is more fit than the old candidate it was derived from while GA allows "less fit" candidates to survive with some probability. Initially DE's approach sounds like a good idea, but I believe that this prevents it from solving the following category of problems.

Imagine that we are trying to maximize the fitness score of:

A - [max(0, A - 50) * B] + [max(0, A - 75) * 2 * B]

where parameters range from 0 to 100.

Initially it is beneficial to increase A, until it reaches 50.
Next, it is beneficial to set B to zero.
Next, it is beneficial to increase A to 75.
Next, it is beneficial to simultaneously increase B and A.

This last point is important: if either A or B are increased independently of each other the fitness score will drop.

Coming back to the Differential Evolution algorithm, I don't see how it can solve the above problem because initially we want to only mutate one parameter per generation but eventually we want to mutate multiple parameters per generation. If we mutate multiple parameters per generation too early, we decrease the probability of survival which, in turn, decreases the rate of evolution. But, if we mutate one parameter at a time we will never find the global maximum.

My question(s) are as follows:

Is this a known problem with the Differential Evolution algorithm?
Are there known solutions?
Do Generic Algorithms suffer from the same problem?

I am not asking for a solution to the specific function mentioned above. I am asking whether it is possible for Differential Evolution to solve problems where you don't know ahead of time how many parameters need to be mutated at any given time and you want to end up as close as possible to the global maximum.

Solution

DE is a popular optimization metaheuristics with a significant number of variants (some examples here).

The first DE is due to R. Storn and K. Price (1997). Since then many changes in the main operators, hybridization, automated parameters' tuning, self-adaptation schemes, structured populations... have been proposed.

So a precise answer to the question would require more details about the DE flavour you're referring to.

Anyway a typical scheme for DE is:

 g = 0                          // Generation 0
 P(g) = {x_1(g), ... , x_n(g)}  // Population initialization 
 while (!stop())                // Some stop condition
   for i = 1 to n
     y_i = new_mutant(X(g))
     z_i = crossover(x_i(g), y_i)
     if f(z_i) < f(x_i(g))
       x_i(g + 1) = z_i
     else
       x_i(g + 1) = x_i(g)
   ++g

A frequent choice for the crossover() function is the binomial crossover (which is similar to GA uniform crossover):

if (random(0,1) < CR || j == 0)
  z_i[j] = y_i[j]
else
  z_i[j] = x_i[j]

(the strategy described is often called rand1bin)

In general more than one component (and at least one) is taken from the mutant vector.

From the start to the end of evolution, DE produces offspring with one or both parameters A / B mutated (this is controlled via CR).

This isn't an issue since DE is a population based algorithm: what is "eventual" for an individual is "initial" for another.

rand1bin, especially, is a good exploration strategy and an initial population exploiting the full numerical range of the parameters (i.e. [0, 100] in the example) avoids the described problem.

It seems to me that order of modification could be an issue only for a tiny population or a population with initial unnecessarily restricted diversity.

Various simulations with rand1bin / best1bin and a small population (20 individuals i.e. 10 x number of parameters) confirm that the specific function isn't "hard" and the global maximum is regularly found within 120 generations.