Search code examples
algorithmrandomgenetic-algorithm

Genetic Algorithm: 2D chromosome; crossover and mutation probabilities


Let me start with the version of genetic algorithm I am implementing. I apologize in advance for any terminology errors that I make here. Please feel free to correct me.

The chromosome for my problem is two dimensional. Three rows and thirty two columns. Essentially the alleles (values) are indexes that are contained by this chromosome.

How an Index is formulated
Each row and column (together) of the chromosome refer to a single gene. Each gene contains an integer value (0 - 30). A single column (I believe referred to as a gnome) therefore refers to an index of a four dimensional array containing user data on which the fitness function operates.

This is how a chromosome would look like

11 22 33 14 27 15 16 ...
3  29  1  7 18 24 22 ...
29  9 16 10 14 21  3 ...

e.g. column 0 ==> data[11][3][29]
where
     11 -> (0, 0); 0th row, 0th column 
     3  -> (1, 0); 1st row, 0th column
     29 -> (2, 0); 2nd row, 0th column

For completeness, the fitness function works as follows: (for a single chromosome)

for first 10 iterations: (user 0 to 9)
    for each column (genome)
         consider gene value for first row as the first index of data array
         consider gene value for the second row as the second index of data array
         consider gene value for the third row as the third index of data array

         so if the first column contains [11][3][29] user = 0 
         it refers to data[0][11][3][29]

     SUM the data array value for all columns and save it
Do the same for all iterations (users)

 for second 10 iterations: (user 10 to 19)
    for each column (genome)
         consider gene value for the SECOND row as the FIRST index of data array
         consider gene value for the THIRD row as the SECOND index of data array
         consider gene value for FIRST row as the THIRD index of data array

     SUM the data array value for all columns and save it
Do the same for all iterations (users)


for third 10 iterations: (user 20 to 29)
    for each column (genome)
         consider gene value for the THIRD row as the FIRST index of data array
         consider gene value for FIRST row as the SECOND index of data array
         consider gene value for the SECOND row as the THIRD index of data array

     SUM the data array value for all columns and save it
Do the same for all iterations (users)

Out of the 30 (sum) values calculated so far, assign the minimum value as fitness value
to this chromosome.

The point to explain the fitness function here is to explain the optimization problem I am dealing with. I am sorry I could not formulate it in Mathematical notation. Anyone who could do it, his/her comment is more than welcome. Essentially it is maximizing the minimum X. Where X refers to data contained in data array. (maximizing is done over generation where the highest fitness chromosomes are selected for next generations)

Q1) I am using a single random number generator for crossover and mutation probabilities. Generally speaking, is this correct was to implement it with a single generator? I ask this question because the crossover rate I chose is 0.7 and mutation to be 0.01. My random number generator generates a uniformly distributed integer number. The number are between 0 to (2^31 - 1). If a number generated by the random function lies under the border where it satisfies mutation, the same number also satisfies crossover. Does this effect the evolution process?

NOTE: the highest number that the random number generates is 2147483647. 1% of this value is 21474836. so whenever a number less than 21474836, it suggests that this gene can be mutated. this number also suggest that crossover must be done. Shouldn't there be different generators?

Q2) Although I see that there is a relation between genes is a column when calculating fitness. But while performing mutation, all the genes should be considered independent from each other or all the rows for a genome (column) should be effected by mutation.

Explanation As I learned in a binary string of e.g. 1000 bits where each bit corresponds to a gene, with a mutation rate of 1% would mean 1 out of 100 bits might get flipped. in my case however I have chromosome which is 2D (3 rows, 32 columns). Should I consider all 96 genes independent of each other or simply consider 32 genes. And whenever I need a flip, flip the column all together. How does mutation work in 2D chromosome?

Q3) Do I really have a correlation between rows here. I am a bit confused?

Explanation I have 2D chromosome, whose column values altogether points to the data i have to use to calculate fitness of this chromosome. Genetic algorithm manipulates chromosomes where as fitness is assigned by the data that is associated with this chromosome. My question is how would genetic algorithm should treat 2D chromosome. Should there be a relation between the genes in a column. Can I get a reference to some paper/code where a 2D chromosome is manipulated?


Solution

  • I'm not sure if i understood the chromosome structure, but it doesn't matter, the concepts are the same:

    1 - You have a chromosome object, which you can access the individual genes

    2 - You have a fitness function, which takes a chromosome and outputs a value

    3 - You have a selection function, which selects chromosomes to mate

    4 - You have a crossover function, which generally takes 2 chromosomes, exchange genes between them and outputs two new chromosomes

    5 - You have a mutation operator, which acts randomly on the genes of a chromosome

    So

    Q1) You can use a single random generator, there's no problem at all. But why are you using integer numbers? It's much easier to generate a random between [0, 1).

    Q2) This is up to you, but generally the genes are mutated randomly, independent of each other (mutation happens after the crossover, but i think you already know that).

    EDIT: Yes, you should consider all the 96 genes independent of each other. For each mutation, you'll select one 'row' and one 'column' and modify (mutate) that gene with some probability p, so:

    for row in chromosome.row
        for col in row
            val = random_between_0_and_1
            if val < p
                chromosome[row][col] = noise
    

    Q4) It's up to you to decide what the fitness function will do. If this chromosome is 'good' or 'bad' at solving your problem, then you should return a value that reflects that.