Search code examples
rggplot2boxplotmedianggboxplot

Plotting medians between nested boxplots with ggplot2


I have some data where every person is either "satisfied" or "dissatisfied". Then, every person also has two types of calculated distances. I have no issues plotting the boxplot. However, I cannot figure out how to plot a line between the two medians of the boxplots.

Reproducible Code:

group1.ids <- c('A123', 'B123', 'C123', 'D123')
Dis.data <- data.frame(id = rep(group1.ids), 
                       satisfaction = rep('Dissatisfied', 8), 
                       DistType = c(rep('A', 4), 
                                    rep('B', 4)), 
                       Dist = runif(8))
group2.ids <- c('E123', 'F123', 'G123', 'H123')
Sat.data <- data.frame(id = rep(group2.ids), 
                       satisfaction = rep('Satisfied', 8), 
                       DistType = c(rep('A', 4), 
                                    rep('B', 4)), 
                       Dist = runif(8))
data <- rbind(Dis.data, Sat.data)

ggplot(data) +
  geom_boxplot(mapping = aes(x = satisfaction, y = Dist, fill = DistType))

I want this: Line connecting medians between nested groups


Solution

  • One way is the convert your x-axis to a continuous scale. To do this, we'll first factorize your satisfaction and DistType variables (you can control the order using this and levels= if needed), and then we can use geom_line to add your lines.

    data2 <- transform(data, satisfaction = factor(satisfaction), DistType = factor(DistType))
    medians <- aggregate(Dist ~ DistType + satisfaction, data2, FUN = median) |>
      transform(x = as.numeric(satisfaction) + as.numeric(DistType)/3 - 0.5)
    medians
    #   DistType satisfaction      Dist         x
    # 1        A Dissatisfied 0.2042941 0.8333333
    # 2        B Dissatisfied 0.5780955 1.1666667
    # 3        A    Satisfied 0.7128209 1.8333333
    # 4        B    Satisfied 0.6022990 2.1666667
    
    ggplot(data2) +
    geom_boxplot(mapping = aes(x = as.numeric(satisfaction), y = Dist, fill = DistType, group = interaction(satisfaction, DistType))) +
    scale_x_continuous(name = "Satisfaction", breaks = seq_along(levels(data2$satisfaction)), labels = levels(data2$satisfaction)) +
    geom_line(aes(x = x, y = Dist, group = satisfaction), data = medians)
    

    enter image description here