Search code examples
rggplot2grouping

Connect medians within nested groups in ggplot?


I am trying to plot these different groups and connect the medians of every series (A,B,C) within each group with a line.

    n <- 60
    data <- data.frame(series=rep(LETTERS[1:3], n/3), 
                   sex=rep(c("F","M"),each=30), 
                   setting=rep(c("wild","rural"),n/2),
                   fit=rnorm(n))

    ggplot(data,aes(x=sex, y=fit, fill=series)) +
    geom_boxplot(width=.3,aes( alpha=.5,color=sex),
              lwd=0.8, position = position_dodge(width = 0.6)) + 
    facet_grid(~setting) +
    stat_summary(fun.y=median, geom="point", shape=23, size=2,
               position=position_dodge(width = 0.6)) +
    geom_text(aes(y=-2.5, label=series),  position=position_dodge(width=0.6)) +

    geom_point(shape=20,alpha=0.2,position=position_jitterdodge(dodge.width = 0.6,jitter.width = 0.25))+
    theme_blank()

This is what it looks like so far-

I have been playing around with geom_line and stats_summary, but not super succesfully, as the grouping always seems wrong. Here is an example of what I tried with stats_summary and what looks like the 4 lines that I actually want (connecting A to C in each group), but is not overlayed with the boxplots for some reason...

    stat_summary(fun.y=median, colour="red", geom="line",aes(x=series, group=interaction(setting,sex)),
             position=position_dodge(width = 0.6))

I have also found a very similar answered question here, but I could not get this soulution working for me. I get an error message when running the second part (Error in geom_line():Problem while computing aesthetics.ℹ Error occurred in the 4th layer. Caused by error in FUN():! object 'series' not found)

At this point I'd also be happy to accept any type of workaround, but couldn't come up with it myself. Thanks!


Solution

  • This is a similar approach to Stefan's, but with the calculations done on-the-fly within ggplot:

    library(ggplot2)
    
    n <- 60
    data <- data.frame(series=rep(LETTERS[1:3], n/3), 
                       sex=rep(c("F","M"),each=30), 
                       setting=rep(c("wild","rural"),n/2),
                       fit=rnorm(n))
    
    ggplot(data, aes(x = as.numeric(interaction(series, sex)) +
                       1.5 * ((as.numeric(interaction(series, sex)) - 1) %/% 3), 
                     y = fit, fill = series)) +
      geom_boxplot(width = 0.5, aes(color = sex), alpha = 0.5) + 
      geom_point(fun = median, stat = "summary", shape = 23, size = 2) +
      geom_text(aes(y = -2.5, label = series)) +
      geom_point(shape = 20, alpha = 0.2, position = position_jitter(0.25)) +
      geom_line(stat = "summary", fun = median, aes(group = sex)) +
      facet_grid(~setting) +
      scale_x_continuous(NULL, breaks = c(2, 6.5), labels = c("F", "M"),
                         limits = c(0, 8.5)) +
      ggnetwork::theme_blank() +
      theme(axis.text.x.bottom = element_text(), strip.background = element_blank())
    

    Created on 2022-12-14 with reprex v2.0.2