First of all, I have to admit I'm not good at graph theory.
I have a weakly connected directed graph G=(V,E)
where V
is about 16 millions and E
is about 180 millions.
For a given set S
, which is a subset of V
(size of S
will be around 30), is it possible to find a weakly connected sub-graph G'=(V',E')
where S
is a subset of V'
but try to keep the number of V'
and E'
as small as possible?
The graph G
may change and I hope there's a way to find the sub-graph in real time. (When a process is writing into G
, G
will be locked, so don't worry about G
get changed when your sub-graph calculation is still running.)
My current solution is find the shortest path for each pair of vertex in S
and merge those paths to get the sub-graph. The result is OK but the running time is pretty expensive.
Is there a better way to solve this problem?
If you're happy with the results from your current approach, then it's certainly possible to do at least as well a lot faster:
Assign each vertex in S to a set in a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure. Then:
Another way to think about doing exactly the same thing:
This will not necessarily find the smallest possible subgraph, but it will minimize its maximum distance from S.