Search code examples
rggplot2time-seriesboxplotposition-dodge

Order nested boxplot groups along a continuous x-axis and across facets [R]


I'd like to get the same order of boxplot fills among temporal groups and across facets using ggplot2 in R. Boxes should be drawn along a continuous x-axis, and box.width should scale with number of values within each group as it is provided by position_dodge2().

Note here, that sometimes the first box within a temporal group (marked by vertical lines) is blue, sometimes red.

Note here, that sometimes the first box within a temporal group (marked by vertical lines) is blue, sometimes red.

Making the variable used for coloring to a factor, does not help. Also it does not seem like the first coloring value within a temporal group, nor the its frequency are responsible for which "color" comes first. Otherwise position_dodge2() does a very good job here, producing the boxes exactly like I want.

Minimal example:

library(ggplot2)
time_data = data.frame( time = c(1:100),
                        y.var = rep(seq(0,1,.02),2)[1:100],
                        f.var = rep(c("A","B","C","D"),25),
                        time.group = c ( rep(c("q"),10),
                                         rep(c("r"),35),
                                         rep(c("s"),5),
                                         rep(c("t"),30),
                                         rep(c("u"),20)
                                         ),
                        col.group = rep(c(T,F,T),40)[1:100]
                        )

break.time = time_data$time[ which( time_data$time.group != lead(time_data$time.group) )]

ggplot()+
  facet_grid(f.var ~.)+
  geom_boxplot(data = time_data, aes( x = time, y = y.var,
                    fill = col.group,
                    group = paste(time.group, col.group)))+
  geom_vline(xintercept = break.time) 

Thanks for any help.


Solution

  • The issue is that the position for your box plots it determined by the time values for each boxplot, i.e. only in the cases where mean(time) is the same for both col.groups in a time.group will dodging have an effect. Otherwise the position is determined by mean(time).

    To make this visible I added a stat_boxplot using a geom="text". From this you can see that "order" of the box plots is determined by mean(time):

    ggplot() +
      facet_grid(f.var ~ .) +
      geom_boxplot(data = time_data, aes(
        x = time, y = y.var,
        fill = col.group,
        group = paste(time.group, col.group)
      )) +
      stat_boxplot(
        data = time_data, geom = "text",
        aes(
          label = after_stat(x),
          x = time,
          y = stage(y.var, after_stat = 0),
          group = paste(time.group, col.group)
        ),
        vjust = 0,
        position = position_dodge2(.75)
      ) +
      geom_vline(xintercept = break.time)
    

    enter image description here

    This said, one option to achieve your desired result would be to make the x positions the same for both col.groups per time.group which could be achieved using stage() and an after_stat= calculation using e.g. ave(). As a first step we could get the order right per time group by computing the position per group.

    library(ggplot2)
    
    ggplot() +
      facet_grid(f.var ~ .) +
      geom_boxplot(data = time_data, aes(
        x = stage(time, after_stat = ave(x, group, FUN = mean)),
        y = y.var,
        fill = col.group,
        group = paste(col.group, time.group)
      )) +
      geom_vline(xintercept = break.time) +
      scale_color_manual(values = "black", guide = "none")
    

    enter image description here

    However, getting the same order of the col.groups for all time groups requires even more effort. The issue is that we need to ensure that the information on the time group is part of the dataset after the stat has been applied. The only way I figured out to achieve that was once again using stage to map the time group on the color aes, then setting color to a constant value and replacing with the default "black" and getting rid of the color legend using scale_color_manual:

    ggplot() +
      facet_grid(f.var ~ .) +
      geom_boxplot(data = time_data, aes(
        x = stage(time, after_stat = ave(x, color, FUN = mean)),
        y = y.var,
        fill = col.group,
        color = stage(time.group, after_stat = "1"),
        group = paste(col.group, time.group)
      )) +
      geom_vline(xintercept = break.time) +
      scale_color_manual(values = "black", guide = "none")
    

    enter image description here