I am trying to solve the problem below. It is like k-minimum-spanning-tree and the steiner tree problem, but it is with a graph.
- We have a non-negative undirected weighted graph G = (V, E).
- For every pair of vertices v1 and v2 there exists an edge e12. In other words, every vertex is connected to every other vertex.
- We shall create a subset of the vertices U that contains k vertices.
- Our goal is to select the n vertices in U such that the sum of the edges from each vertex in U to every other vertex is minimized. In other words, we want to select the vertices in U so that the sum of all the edges from the nodes in U outwards is minimized.
- 1 < n < number of vertices
Am I correct that neither k-MST or the Steiner tree approximation solutions will work? If so, what is this problem called? And what are the solutions? I am fine with using heuristics or approximations to solve this problem and don't need formal proofs.
You are correct that K-MST or Steiner tree won't work - they produce a tree only, while you need a graph with special properties, e.g. with 0 cost between vertices within U and minimal cost for all other edges if I understood your problem.
While juancn's answer looks correct, I think that using something like metaheuristic, e.g. simulated annealing, or constraint satisfaction approaches will be better.
For metaheuristics:
- Compute edge-cost for each vertex
- Greedely pick k vertices - it forms your initial solution
- In case of SA, start modifying initial solution be including/excluding new vertices one by one (maybe there is a better approach, you should study it yourself)
- Given enough time, it should converge to good enough solution
For constraint satisfaction:
- Objective: select k vertices from a given graph. For each vertex introduce
a Boolean variable, if it's 1 - the vertex is a part of U, otherwise, it's 0. Then your objective is:
sum (vertices==1) = k
- Subject to: the minimal sum of edges weights between k-vertices and others. If I'm correct, the cost for edges in U is 0. I don't know how to formulate such constraints properly, but they should be rather simple.
- Run a solver with timeout, let's say a few hours.
For the last approach, constraint satisfaction, memory could be a problem - you need a lot of memory to represent a fully-connected graph and all constraints. Still, check Minizinc, lpsolve and coin-or projects.