This is my first post on these forums, thanks in advance for all responses.
I am developing a java application in which I have encountered what I believe is termed a “Combinatorial Optimization Problem”. I have only basic math skills, so trying to investigate the setup of such a problem have not been fruitful so far.
Basically, what I would like to do is to program the most efficient way of finding the optimal subset n of a larger set N with variables v1, v2, v3 etc. The variables all have a definite value/score in addition to a value/score that is dependant on certain other variables that may or may not be included in the subset.
I am interested in selecting the subset which gives the maximum total value/score.
So, for example, the full set N consists of the following variables and have the following definite values as well as relations to the other variables:
v1 8 { v2 ; v8 }
v2 6 { v1 ; v4 }
v3 9 { }
v4 7 { v2 ; v5 ; v8 }
v5 6 { v4 ; v10 }
v6 8 { v7 }
v7 5 { v6 }
v8 9 { v1 ; v4 }
v9 6 { }
v10 7 { v5 }
Having a relation to one or more other chosen variable means that the total value will have a certain “take-off” – for the sake of this example let’s say one for each relation. So selecting the first five variables as a subset will give a total value of 30:
v1 8 { v2 ; v8 } = 8 - 1 = 7
v2 6 { v1 ; v4 } = 6 - 2 = 4
v3 9 { } = 9 - 0 = 9
v4 7 { v2 ; v5 ; v8 } = 7 - 2 = 5
v5 6 { v4 ; v10 } = 6 - 1 = 5
This is of course not a problem for such small sets, but I am currently facing sets of 100K and subsets of 10K – with the current algorithm, my computer calculated the solution in 3 days!
I do not necessarily need a code for solving this, but rather the optimal mathematical way to do it (if there are any!). Keep in mind though that I’m having a hard time understanding mathematical notation above basic level.
Again, thanks in advance for all responses!
Unfortunately this problem is NP-hard to find the optimal solution.
If you could solve this problem efficiently then you could solve the NP-hard maximum independent set problem by setting the value for each vertex to 1, and a very large penalty for each edge.
So you should instead look for approximate solutions. You may find you get a reasonable answer with simulated annealing or genetic algorithms.