I had interest in Direct Acyclic Graphs (DAG) for a long time and after reading about Topological sort at Wikipedia, I did not find any special mentioning of an approach involving layers numbering (although layers are extensively mentioned for drawing). With this approach the graph is not technically topologically sorted, but knowing that every node contains the correct number for layer (level), we always can tell whether a particular node "bigger" than another topologically. On the other side as long as we don't have an ordered list, we can not enumerate the nodes topologically (although this can be done with a final conventional sort that compares the levels of the nodes).
This approach allows implementing arbitrary connecting while maintaining the correctness of levels information. The steps can be:
For any set of known nodes and connections between them, we just add all nodes applying level=1 to them and just try to apply all known connections between (ignoring and undoing cicles).
The final level information not only allows comparing nodes topologically, but contains other useful information. For example:
I suppose that for some artificial data (n nodes, every Node(n) connected to Node(n + 1)) this algorithm can be very slow. But for real-world data I tried it with (Wikipedia categories - 800,000 nodes - 2,000,000 connections) the time is decent (5-10 minutes) and the number of levels and cycle attempts is low (369 levels, 1000 cycle attempts)
So is this method new or is well-known, but just not widely presented in Wikipedia and other resources? Since it's not a sort (technically), should it be called a data restructuring?
There are some papers on incrementally maintaining the topological order of nodes in a graph, with variations on the algorithm you describe.
If the graph has n
nodes and m
edges, you spend time O(m + n)
every time you insert an edge.
The papers ask how much time will it take to insert k
edges? Trivially, O(k * (n + m))
. But in fact you can show much better upper bounds - something like O(k * sqrt(m + n))
for large enough k
.
Some links below, there are more:
http://igitur-archive.library.uu.nl/math/2007-0725-201647/2005-011.pdf
http://arxiv.org/abs/0802.1059
http://www.siam.org/proceedings/soda/2009/SODA09_120_benderm.pdf