Search code examples
algorithmgraphtopological-sort

Topological sort variant algorithm


I have a set of data on which I need to perform a topological sort, with a few assumptions and constraints, and I was wondering if anyone knew an existing, efficient algorithm that would be appropriate for this.

  • The data relationships are known to form a DAG (so no cycles to worry about).
  • An edge from A to B indicates that A depends on B, so B must appear before A in the topological ordering.
  • The graph is not necessarily connected; that is, for any two nodes N and M there may be no way to get from N to M by following edges (even if you ignore edge direction).
  • The data relationships are singly linked. This means that when there is an edge directed from A to B, only the A node contains information about the existence of the edge.

The problem can be formulated as follows:

Given a set of nodes S in graph G which may or may not have incoming edges, find a topological ordering of the subgraph G' consisting of all of the nodes in G that are reachable from any node in set S (obeying edge direction).

This confounds the usual approaches to topological sorting because they require that the nodes in set S do not have any incoming edges, which is something that is not true in my case. The pathological case is:

A --> B --> D
|     ^     ^
|     |     |
\---> C ----/

Where S = {B, C}. An appropriate ordering would be D, B, C, but if a normal topological sort algorithm happened to consider B before C, it would end up with C, D, B, which is completely wrong. (Note that A does not appear in the resulting ordering since it is not reachable from S; it's there to give an example where all of the nodes in S might have incoming edges)

Now, I have to imagine that this is a long-solved problem, since this is essentially what programs like apt and yum have to do when you specify multiple packages in one install command. However, when I search for keyphrases like "dependency resolution algorithm", I get results describing normal topological sorting, which does not handle this particular case.

I can think of a couple of ways to do this, but none of them seem particularly elegant. I was wondering if anyone had some pointers to an appropriate algorithm, preferably one that can operate in a single pass over the data.


Solution

  • I don't think you'll find an algorithm that can do this with a single pass over the data. I would perform a breadth-first search, starting with the nodes in S, and then do a topological sort on the resulting subgraph.