Search code examples
algorithmdirected-acyclic-graphstopological-sort

How this DAG topological restructuring can be called?


I had interest in Direct Acyclic Graphs (DAG) for a long time and after reading about Topological sort at Wikipedia, I did not find any special mentioning of an approach involving layers numbering (although layers are extensively mentioned for drawing). With this approach the graph is not technically topologically sorted, but knowing that every node contains the correct number for layer (level), we always can tell whether a particular node "bigger" than another topologically. On the other side as long as we don't have an ordered list, we can not enumerate the nodes topologically (although this can be done with a final conventional sort that compares the levels of the nodes).

This approach allows implementing arbitrary connecting while maintaining the correctness of levels information. The steps can be:

  • For any newly added node (without any connection) the level applied is 1.
  • If a connection between two nodes is requested (from m to n) and the n.level > m.level then they are just simply being connected, no level fixing for other nodes is required.
  • If for requested connection (from m to n) n.level<=m.level then we change n.level to (m.level + 1) and recursively check any dependencies of n for similar level increase (or no increase if the levels on a recursive step are compatible).
  • If we keep the list of recursively entered nodes then we can detect an attempt to cycle and implement some kind of undo operation (returning the levels of all affected nodes to previous values)

For any set of known nodes and connections between them, we just add all nodes applying level=1 to them and just try to apply all known connections between (ignoring and undoing cicles).

The final level information not only allows comparing nodes topologically, but contains other useful information. For example:

  • every node with level = 1 has no incoming connections (every path starts from one of them). So any DAG enumeration can be started from them.
  • The longest path (the number of edges) can not be longer than the (largest level + 1)

I suppose that for some artificial data (n nodes, every Node(n) connected to Node(n + 1)) this algorithm can be very slow. But for real-world data I tried it with (Wikipedia categories - 800,000 nodes - 2,000,000 connections) the time is decent (5-10 minutes) and the number of levels and cycle attempts is low (369 levels, 1000 cycle attempts)

So is this method new or is well-known, but just not widely presented in Wikipedia and other resources? Since it's not a sort (technically), should it be called a data restructuring?


Solution

  • There are some papers on incrementally maintaining the topological order of nodes in a graph, with variations on the algorithm you describe.

    If the graph has n nodes and m edges, you spend time O(m + n) every time you insert an edge. The papers ask how much time will it take to insert k edges? Trivially, O(k * (n + m)). But in fact you can show much better upper bounds - something like O(k * sqrt(m + n)) for large enough k.

    Some links below, there are more:

    http://igitur-archive.library.uu.nl/math/2007-0725-201647/2005-011.pdf

    http://arxiv.org/abs/0802.1059

    http://www.siam.org/proceedings/soda/2009/SODA09_120_benderm.pdf