Search code examples
c#algorithmgraphgraph-theorysubgraph

Find subgraphs in a directed graph which are isolated by certain properties


Please excuse my small knowledge of graph theory vocabulary.

I can only describe the problem with common english words. Maybe someone can point me into the right direction and/or terms to look up.

The problem came up as part of the implementation of a visual programming language. Where a vertex is a function/method and the edges transport data between the functions. Now there is the following problem:

It could be allowed to connect the output of vertex A with the type of Collection< TItem > to the input of vertex B with type TItem. And then the output of vertex B with type TItem to the input vertex C with type Collection< TItem >. This would tell the compiler that it has to wrap a foreach function around vertex B to apply the function of B to each item in the collection from A and output the new items as collection to the input of C. So the edge from A to B is a many to one connection and from B to C is one to many.

Now the actual problem is, what kind of algorithm would find a (directed) subgraph that is surrounded/isolated by one to many connections? so that the compiler would wrap a foreach function around this particular subgraph? I've tried to visualize the problem in this picture:

enter image description here


Solution

  • I would suggest the following algorithm:

    Step 1 Walk through all the nodes. If you find a blue node, do a depth-first search in the directed graph to find out the set of white nodes reachable from it. Don't cross blue nodes while doing the DFS. Along with the set of nodes, store the starting blue node and the outgoing blue nodes that you discovered during the DFS.

    You end up with multiple sets of white nodes, along with information about the incoming and outgoing blue nodes:

    enter image description here

    (bear with me, my mouse drawing skills are really bad)

    Step 2 As you can see, you might have overlaps. There are two possibilities to resolve this:

    • Merge overlapping sets by using a disjoint-set data structure afterwards. This results in a O(n² + m) worst case runtime.

    • Avoid creating the overlaps in the first place by modifying the standard DFS algorithm. It should detect when you reach a node that you have already seen in one of the previously explored sets. It should then not explore the subgraph further, but record that the currently explored set and the overlapping one are to be merged later. Afterwards you can find the connected components in the merging graph. This will give you a O(n + m) runtime, which is a lot better.

    You end up with a collection of disjoint sets of white nodes together with respective incoming and outgoing blue nodes:

    enter image description here