Search code examples
rggplot2plotfacet-gridgtable

Draw Lines between Facets of facet_grid


Intro

I want to draw lines between a faceted ggplot. The main goal is to connect those measurements with a line which we want to test against. So basically I want to insert some kind of significance bars inside and between the facets of a ggplot boxplot (or any kind of plot for that matter).

Research

ggsignif

I know that there is ggsignif package which does this for all non faceted plots. There are answers which try to circumvent this drawback Using ggsignif with grouped bar graphs and facet_wrap not working.

Using ggplot_build

There is an approach which could be modified for my purpose but a major drawback with the solution of this question ggplot, drawing line between points across facets that one has to specify the lparameter of gtable_add_grob manually. I was not able to figure out how one could automate the l parameter using only the name of the facet panel we want to start end end with. Since $layout$name only hold some arbitrary names like "panel-1-1" which is the actual panel name but how would one get from that to the l parameter which is needed to specify the start and end of the line

Background

I'd like to automate the process of drawing lines between faceted plots, for more insight see my post about Valid Comparisons of Multiple Grouping Variables. In the end I want to use this to automatically annotate plots to visualize which are valid comparisons and potentially also add significance bars to the plot likewise ggsignif but with faceted plots.

Examples

Create mockup data

So this is the data we are working on:

# Create a dummy dataframe
# Create a dummy dataframe
df <- expand.grid(
  St= 1:10,
  MAT= c("A", "B", "C"),
  TREAT= factor(1:2)
)
df$St<- rnorm(nrow(df))

df$OPERATOR<- rep(c("TM", "CX"), each = 5, length.out = nrow(df))

# numbers are randomly generated, so this is different each time
head(df)
St MAT TREAT OPERATOR
1 -0.488805635 A 1 TM
2 2.658658027 A 1 TM
3 1.680278205 A 1 TM
4 0.779584009 A 1 TM
5 0.713240520 A 1 TM
6 -0.542881937 A 1 CX
Example Plot

this results in the following plot:

# ggplot with multiple facets (nested)
p <- ggplot(data = df, 
            aes(x = TREAT,
                y = St,
                color = MAT))+
  geom_boxplot() +
  ggh4x::facet_nested(~ MAT + OPERATOR) +
  theme_classic()
p

plot without lines

What I like to have

I want to draw lines between the facets like this. With the ability to draw lines inside a facet (blue) or even subfacet (green) and between different facets (black). The y-Position is here arbetrary chosen but should be similar to the ordering here. The plot is made with inkscape.

Plot with lines between facets and inside facets or subfacets

As you can see we have a simple facet_nested boxplot with lines between some data points to demonstrate between what data I want to draw horizontal lines.

What I tried

Using google and some LLM (chatGPT or Bing) i was able to create some code to automatically select the facets and subfacets.

# Build the plot
gb <- ggplot_build(p)

# Get panel parameters
ranges <- gb$layout$panel_params

# Get npc position of a specific facet and subfacet
mat_name1 <- "A" # first facet
operator_name1 <- "TM" # first subfacet

# vs 
mat_name2 <- "B" # first facet
operator_name2 <- "TM" #first subfacet

# x axis name
x_name <- "1"

# Find the index of the panel that corresponds to the specified facet and subfacet
# TO DO change MAT and OPERATOR with strings so it is adaptable to the grouping column names
panel_index1 <- which(gb$layout$layout$MAT == mat_name1 & gb$layout$layout$OPERATOR == operator_name1)
panel_index2 <- which(gb$layout$layout$MAT == mat_name2 & gb$layout$layout$OPERATOR == operator_name2)

This returns the correct number of the facet or subfacet panel number (in this ca 1 and 3) . But I was not able to extract the exact coordinate of A-TM-1 vs B-TM-1 (especially the 1).

In addition I don't know how to use that information to to draw the lines on top of the plot.

Outro

I hope someone can grasp what I want to achieve and help me understand the logic of the ggplot_build information to extract the positions and how to use that info to draw lines ontop of the plot. Idealy I can write a function which takes a set of facet, subfacet(s) combinations including the x axis location (in this case 1 or 2 or any other x axis labels) to draw those lines between the boxes. ( e.g. c(c('A', 'TM','1'), c('B', 'TM','1')) But a more general understanding how to interpret the output of ggplot_build and the construction of those plot would be wonderfull.

Most promising looks the approach of ggplot, drawing line between points across facets but there I struggle in automatically selection the right l parameter.

It would be a great deal of help if someone can point me in the right direction since this kind of plots are day to day buisness and it would save a lot of time and errors if I could automate that process like ggsignif.

If you need any additional information, please don't hesitate to ask.

Best

TMC


Solution

  • It is possible to draw lines connecting facets; in fact, there are a few ways to do it, but none of them is easy. If I were doing this I would want the result to be a ggplot object, rather than a ggplot with lines drawn over it (this itself is also possible a couple of different ways).

    Any method ending with a vanilla ggplot object must have clip = "off" set in its coordinates to allow lines to stretch between panels. Since facet panels are drawn sequentially onto the page, we must also turn the panel.background into an element_blank(). Any grid lines or vertical axis lines will be drawn over your facet-spanning lines, but fortunately your chosen theme is perfect for this.

    Another issue is that you need to set hard co-ordinate limits, otherwise the co-ordinates will simply expand to accommodate your lines. This means in effect that you need to have all factor levels present in each facet, and can't use scales = "free_x". Again, this is not a problem with your set-up.

    If you draw the lines with geom_segment, then it's easy to specify the starting point at the left of the line; the main difficulty comes in finding the x value on the right. This needs to be calculated for each line. Effectively, you need to ask "if this panel's x axis was numeric and extended indefinitely, at what x value would I want the line to end?".

    You can specify a function to work this out for you, returning a data frame you can feed to geom_segment (or geom_textsegment if you want labels)

    xpos <- function(data, fac1, fac2, xvar, yvals, labels) {
    
      get_xpos <- function(data, fac1, fac2, xvar) {
        datafac  <- list(xvar = data[[names(xvar)[1]]],
                         fac1 = data[[names(fac1)[1]]],
                         fac2 = data[[names(fac2)[1]]])
        datafac  <- lapply(datafac, as.factor)
        datalevs <- lapply(datafac, levels)
        datanum <- lapply(datalevs, function(x) as.numeric(factor(x)))
        datanum[-1] <- lapply(datanum[-1], function(x) x - 1)
        datanum$fac2 <- (max(datanum$xvar) + 1/3) * datanum$fac2
        datanum$fac1 <- (max(datanum$xvar + 1/3) + max(datanum$fac2)) * datanum$fac1
        levs <- Map(match, list(unlist(xvar), unlist(fac1), unlist(fac2)), datalevs)
        final_vals <- Map(function(x, i) x[i], datanum, levs)
        facet_add <- final_vals$fac1 + final_vals$fac2
        facet_add[2] - facet_add[1] + final_vals$xvar[2]
      }
      
      names(fac1[[1]]) <- rep(names(fac1), length(fac1[[1]]))
      names(fac2[[1]]) <- rep(names(fac2), length(fac2[[1]]))
      names(xvar[[1]]) <- rep(names(xvar), length(xvar[[1]]))
      
      x <- sapply(seq_along(xvar[[1]]), function(i) {
        get_xpos(data, fac1[[1]][i], fac2[[1]][i], xvar[[1]][i])
      })
      d <- data.frame(sapply(fac1[[1]], `[`, 1), sapply(fac2[[1]], `[`, 1),
                      sapply(xvar[[1]], `[`, 1), x, yvals[[1]], labels)
      setNames(d, c(names(fac1), names(fac2), names(xvar), 
                    "xpos", names(yvals), "labels"))
    }
    

    It still takes a bit of work to call this function, because we need to feed it a list of the start and end levels of each of our faceting and x axis variables:

    segs  <- xpos(data = df, 
               xvar = list(TREAT = list(c(1, 2), c(1, 1), 
                                        c(1, 1), c(1, 1))), 
               fac1 = list(MAT = list(c("A", "A"), c("A", "A"), 
                                      c("A", "B"), c("A", "C"))), 
               fac2 = list(OPERATOR = list(c("CX", "CX"), c("CX", "TM"), 
                                           c("CX", "CX"), c("TM", "TM"))),
               yvals = list(St = c(1.5, 1.8, 2.1, 2.4)),
               labels = c("Label 1", "Label 2", "Label 3", "Label 4"))
    

    But at least our final plotting code is straightforward:

    library(geomtextpath)
    
    ggplot(data = df, aes(x = TREAT, y = St, color = MAT)) +
      geom_boxplot() +
      geom_textsegment(data = segs,
                       aes(xend = xpos, yend = St, group = MAT, label = labels), 
                       color = c("green4", "blue", "black", "black"),
                       linewidth = 1, vjust = -0.2) +
      ggh4x::facet_nested(~ MAT + OPERATOR) +
      coord_cartesian(clip = "off", xlim = c(1, 2)) +
      theme_classic() +
      theme(panel.background = element_blank())
    

    enter image description here

    Of course, this is all a bit cumbersome. It could be tweaked to have an easier interface to use, but all the above caveats make me wonder this is worth the effort. This really depends on how you plan to use it.