how to align nodes that are in different subgraphs in graphviz?

the code below creates this diagram. I would like to to make all 3 subgraphs 5 nodes tall. so that, for example data_source_1, process_1 and product_1 are all horizontally aligned properly (same with process_5 and product_4). That would make the diagram better readable. is there a way of doing that? (I am using R to make the plot, but the issue is with the graphviz syntax).


library(DiagrammeR)

my_graph <- grViz(paste0("
  digraph {

  graph[splines = ortho,
  ordering = 'in',
  rankdir='LR',
  concentrate=true,
  labeljust= 'c',
  layout = dot,
  overlap =true,
  outputorder = nodesfirst]
  node [shape = rectangle, style = filled, fillcolor = 'blanchedalmond']
  edge[color = black, arrowhead=vee, minlen = 1]
  
  # draw lines 
  
  # from survey data to inpu
    data_source_1 -> process_1
    data_source_2 -> process_2
    data_source_3 -> process_2
    data_source_3 -> process_2
    data_source_4 -> process_2
    data_source_5 -> process_3
    data_source_6 -> process_4
    data_source_6 -> process_5

  # from input to cleaning
    process_1 -> product_1
    process_2 -> product_2
    process_3 -> product_3
    process_4 -> product_3
    process_5 -> product_4
    
  
# add clusters    
subgraph cluster_1 {
        node [style=filled]
        'data_source_1' 'data_source_2' 'data_source_3' 'data_source_4' 'data_source_5' 'data_source_6' 
        color='red';
        label = 'Data';
        style=filled;

}
    
    subgraph cluster_2 {
        node [style=filled]
        'process_1' 'process_2' 'process_3' 'process_4' 'process_5'
        color='lightblue';
        label = 'process';
        style=filled;

    }
        subgraph cluster_3 {
        node [style=filled]
        'product_1' 'product_2' 'product_3' 'product_4'
        color='yellow';
        label = 'process';
        style=filled;

    }
}"))


my_graph

Solution

Darn. I just saw your note to avoid invisible nodes, which I've used :-( . I'll leave it here in case something in it might be helpful - nothing I've read seems to suggest a way to force dot to format so tightly without them so far. It is possible the approach I took won't be affected by variable data size.

I'm no master of dot, but I do have an option and suboption for you.

The default below has your same columns, same order, same varying heights. If you follow the comment in the top, you can have matched-height columns.

There is some extra margin on the top and left from hidden structure, hence it's not perfect. The most troubling issue was trying to define horizontal order, which is why "head" and "ordering=in" are in there, although I'm not certain they're deterministic. You can, depending on your use case, just skip the entire padding issue by autocropping the image, for example, with this ImageMagick command to autocrop the output image from dot into a new cropped.png file:

convert dot-output.png -flatten -trim +repage cropped.png

I'd tried another approach using a grid to get more control over margins, but that one didn't lead to any obvious way to replicate your colored columns.

Graphviz's Dot really isn't intended for precision layout, I suspect, given that there don't seem to be any options for four-way margin control, just two way, x versus y. Although you should be able to pad the Product subgraph if you want it to be more even horizontally, see https://graphviz.org/docs/attrs/margin/ for more info. There was one other interesting example at https://graphviz.org/Gallery/directed/Linux_kernel_diagram.html but it's opaque enough to be difficult to use absent any commentary on how it works.

Lastly, if you enable the three empty cells, you might want to swap some of the process/product slots lower to improve the arrow layout, although this won't apply if your column lengths will vary in automated use, of course.

# remove all the following "#" characters to get matched-height columns

digraph {
  graph [
     newrank = true,
     nodesep = 0.7,
     ranksep = 0.3,
     overlap = true,
     splines = ortho,
     layout  = dot,
     concentrate = true,
     compound = true,
  ]
  node [ shape=rectangle, style=filled, fillcolor=blanchedalmond ]
  edge [ color=black, arrowhead=vee, minlen=1 ]

  {
    node [ shape=point height=0 width=0 style=invis ]
    edge [ arrowsize=0 style=invis ]
    r1    r2    r3    r4    r5    r6
    r1 -> r2 -> r3 -> r4 -> r5 -> r6
  }

  subgraph cluster_data {
    label = Data;
    node [style = filled]
    color = red;
    style = filled;
    
    data_source_1
    data_source_2
    data_source_3
    data_source_4
    data_source_5
    data_source_6
  }

  subgraph cluster_process {
    label = Process;
    node [ style=filled ]
    color = lightblue;
    style = filled;

    process_1
    process_2
    process_3
    process_4
    process_5
#   process_6 [ shape=plain style=invis ]

    data_source_1 -> process_1
    data_source_2 -> process_2
    data_source_3 -> process_2
    data_source_3 -> process_2
    data_source_4 -> process_2
    data_source_5 -> process_3
    data_source_6 -> process_4
    data_source_6 -> process_5
  }

  subgraph cluster_product {
    label = Product;
    node [style = filled]
    color = yellow;
    style = filled;

    product_1
    product_2
    product_3
    product_4
#   product_5 [ shape=plain style=invis ]
#   product_6 [ shape=plain style=invis ]
    
    process_1 -> product_1
    process_2 -> product_2
    process_3 -> product_3
    process_4 -> product_3
    process_5 -> product_4
  }

  subgraph R0 {
    graph [ ordering=in ]
    node [ style=invis ]
    edge [ style=invis ]
    head
    [ rank=min ] head -> data_source_1
    head -> process_1
    head -> product_1   
  }
  subgraph R1 {
    graph [ rank=same ]
    r1 data_source_1 process_1 product_1
  }
  subgraph R2 {
    graph [ rank=same ]
    r2 data_source_2 process_2 product_2
  }
  subgraph R3 {
    graph [ rank=same ]
    r3 data_source_3 process_3 product_3
  }
  subgraph R4 {
    graph [ rank=same ]
    r4 data_source_4 process_4 product_4
  }
  subgraph R5 {
    graph [ rank=same ]
    r5 data_source_5 process_5 # product_5
  }
  subgraph R6 {
    graph [ rank=same ]
    r6 data_source_6 # process_6 product_6
  }
}