Search code examples
graphvizdotsubgraph

Why does Graphviz no longer minimise edge lengths when subgraphs are introduced


I have this Graphviz graph:

digraph
{
   rankdir="LR";
   overlap = true;
   Node[shape=record, height="0.4", width="0.4"];
   Edge[dir=none];

   A B C D E F G H I 

   A -> B -> C
   D -> E -> F
   G -> H -> I

   Edge[constraint=false]

   A -> D -> G

   subgraph clusterX
   {
      A
      B
   }

   subgraph clusterY
   {
      E
      H
      F
      I
   }
}

which produces this output:

Graphviz output

I would have expected the length of the edge between A and D to be minimised so that the nodes would be arranged as:

A B C
D E F
G H I

rather than

D E F
G H I
A B C

This works as expected if I remove the subgraph definitions.

Why does Graphviz place A B C at the bottom when the subgraphs are introduced?


Solution

  • This is not really about minimizing edge lengths, especially since in the example the edges are defined with the attribute constraint=false.

    While this is not a complete answer, I think it can be found somewhere within the following two points:

    • The order of appearance of nodes in the graph is important.
    • Changing rankdir to LR contains unpredictable (or at least difficult to predict) behaviour, and/or probably still a bug or two (search rankdir).

    I'll try to explain as good as I can and understand graphviz, but you may want to go ahead and read right away this reply of Emden R. Gansner on the graphviz mailing list as well as the following answer of Stephen North - they ought to know, so I will cite some of it...


    Why is the order of appearance of nodes important? By default, in a top-down graph, first mentioned nodes will appear on the left of the following nodes unless edges and constraints result in a better layout.

    Therefore, without clusters and rankdir=LR, the graphs appears like this (no surprises):

    A D G
    B E H
    C F I
    

    So far, so good. But what happens when rankdir=LR is applied?

    ERG wrote:

    Dot handles rankdir=LR by a normal TB layout and then rotating the layout counterclockwise by 90 degrees (and then, of course, handling node rotation, edge direction, etc.). Thus, subgraph one is positioned to the left of subgraph two in the TB layout as you would expect, and then ends up lower than it after rotation. If you want subgraph one to be on top, list it second in the graph.

    So if that would be correct, without clusters, the nodes should appear like this:

    G H I
    D E F
    A B C
    

    In reality, they do appear like this:

    A B C
    D E F
    G H I
    

    Why? Stephen North replied:

    At some point we decided that top-to-bottom should be the default,
    even if the graph is rotated, so there's code that flips the flat edges internally.

    So, the graph is layed out TB, rotated counterclock wise and flat edges flipped:

    A D G     G H I     A B C
    B E H --> D E F --> D E F
    C F I     A B C     G H I
    

    While this works quite well for simple graphs, it seems that when clusters are involved, things are a little different. Usually edges are also flipped within clusters (as in clusterY), but there are cases where the flat edge flipping does not work as one would think. Your example is one of those cases.

    Why is the error or limitation in the flipping of those edges? Because the same graphs usually display correctly when using rankdir=TB.


    Fortunately, workarounds are often easy - for example, you may use the order of appearance of the nodes to influence the layout:

    digraph
    {
       rankdir="LR";
       node[shape=record, height="0.4", width="0.4"];
       edge[dir=none];
    
       E; // E is first node to appear
       A -> B -> C;
       D -> E -> F;
       G -> H -> I;
    
       edge[constraint=false]
       A -> D -> G;
    
       subgraph clusterX { A; B; }
       subgraph clusterY { E; F; H; I; }
    }