java graph permutation genetic-algorithm

Genetic Algorithms: how to implement permutation-chromosomes with less than n! valid permutations

I'm currently working on a genetic algorithm which I want to find (or approximate the best permutation in a directed, non-cyclic, non-state-based graph. A possible graph could look as follows: Example-Graph

(Note that multiple incoming Nodes mean multiple conditions. So in order to chose G, both B and F have to be chosen first)

As opposed to something like the Traveling Salesman Problem, not all nodes on my graph are connected (hence B and E), connections won't work for both directions (A->B is fine, but B->A is not possible) and there is no node I could define as a current position (meaning after going from A to B, D is still a valid option). Therefore, while searching for permutations to put through the fitness-calculation, the solution space isn't n! but much less (~145 for the example given above). The rule for validation a permutation would be "for any node at position n, all its preconditions must be at a position less than n"

For example "A-B-D-C-E-F-H-G-I" would be a valid permutations, while "I-G-C-H-E-F-D-B-A" would be pretty much as invalid as it can get.

With this information, I could just verify any given permutation within the fitness-function and assign a value of 0, if it is invalid. Yet, I'm hoping that with your help I can find a more efficient solution, since I'm working with graphs that can have around 300 nodes, and calculation for all the invalid possibilites would be unacceptably time consuming. So I want to design the chromosome in a way that for both randomized starting population and evolutionary operations, only valid individuals are added to any given population.

As for testing, I'm using the JGAP library for Genetic Algorithms and Genetic Programming in Java, but an implementation of this solution isn't mandatory.

Thanks a lot for you help and please forgive me for any stupid expression since I'm not a native speaker and for any stupidity within this question since I'm fairly new to genetic algorithms.

Solution

Your graph is DAG (Directed Acyclic Graph), as you said. Also, node A that is connected to node B must come before the node B. This is basically the definition of topological sorting of such graph. You just want to find such a sorting (since there can be more than one) that is best according to a measure of your choice.

Here is a solution I propose.

Genotype

For each node in the graph assign a number. The numbers for these nodes are going to be the genotype. So for the graph you posted as an example, you would have 9 numbers. Let's give it some notation: K(n) will be the number assigned to node n.

Decoding the genotype

To decode genotype (the set of numbers) into the phenotype (the sorting), follow this procedure (basically Khan's algorithm with tie-breaking):

input: N is a set of all nodes in the graph
S += {r | r in N && r has no incoming connections }  // a set with the root nodes
I := {n -> |predecessors(n)| | n in N}  // a mapping from the nodes to numbers, the numbers being the numbers of incoming edges to the nodes
ordering := []  // a list of nodes
while |S| > 0  // while the open set is non-empty
    n := arg min_q {K(q) | q in S}  // from the open set, select the node with the lowest assigned number (see above)
    S -= {n}  // remove the node from the open set
    ordering += n  // add the node to the ordering
    for each q in successors(n)
        I[q] -= 1  // decrease the number of incoming nodes for each successor of n
    S += {q | I[q] == 0}  // add all nodes that have no incoming edge left to S
return ordering

This way you always get a valid solution and the numbers assigned to the nodes just determine which of the number of valid solutions is going to be constructed. You can happily evolve the numbers and you will get different orderings.

The vanilla Khan's algorithm has a running time O(|E| + |V|), i.e. linear in the number of nodes plus the number of edges in the graph. Here we need to sort the set S so the complexity will get higher, depending on the datastructure used for S. If it is a heap-based priority queue, you get something like (I'm guessing now as I'm too lazy to compute/prove it) O(|E| + |V| * log |V|). You probably want to optimize this procedure as much as possible as it is going to be called a lot.

Remarks

This decoding trick is called indirect encoding, i.e. you evolve something that is not evaluated directly but is rather transformed to something else that is then evaluated. This has the nice benefit of always producing a valid solution, but it also has a major drawback: small change in the genotype may result in a big change in the phenotype and vice versa. This might make it hard for the genetic algorithm.

I also suggest you try other optimisation frameworks than just GA, e.g. Simmulated annealing.