Search code examples
rggplot2linemeanboxplot

How to add a line in boxplot?


I would like to add lines between "mean" in my boxplot.

My code:

library(ggplot2)
library(ggthemes)

Gp=factor(c(rep("G1",80),rep("G2",80)))
Fc=factor(c(rep(c(rep("FC1",40),rep("FC2",40)),2)))
Z <-factor(c(rep(c(rep("50",20),rep("100",20)),4)))
Y <- c(0.19 , 0.22 , 0.23 , 0.17 , 0.36 , 0.33 , 0.30 , 0.39 , 0.35 , 0.27 , 0.20 , 0.22 , 0.24 , 0.16 , 0.36 , 0.30 , 0.31 , 0.39 , 0.33 , 0.25 , 0.23 , 0.13 , 0.16 , 0.18 ,  0.20 , 0.16 , 0.15 , 0.09 , 0.18 , 0.21 , 0.20 , 0.14 , 0.17 , 0.18 , 0.22 , 0.16 , 0.14 , 0.11 , 0.18 , 0.21 , 0.30 , 0.36 , 0.40 , 0.42 , 0.26 , 0.23 , 0.25 , 0.30 ,  0.27 , 0.15 , 0.29 , 0.36 , 0.38 , 0.42 , 0.28 , 0.23 , 0.26 , 0.29 , 0.24 , 0.17 , 0.24 , 0.14 , 0.17 , 0.16 , 0.15 , 0.21 , 0.19 , 0.15 , 0.16 , 0.13 , 0.25 , 0.12 ,  0.15 , 0.15 , 0.14 , 0.21 , 0.20 , 0.13 , 0.14 , 0.12 , 0.29 , 0.29 , 0.29 , 0.24 , 0.21 , 0.23 , 0.25 , 0.33 , 0.30 , 0.27 , 0.31 , 0.27 , 0.28 , 0.25 , 0.22 , 0.23 , 0.23 , 0.33 , 0.29 , 0.28 , 0.12 , 0.28 , 0.22 , 0.19 , 0.22 , 0.14 , 0.15 , 0.15 , 0.21 , 0.25 , 0.11 , 0.27 , 0.22 , 0.17 , 0.21 , 0.15 , 0.16 , 0.15 , 0.20 , 0.24 ,  0.24 , 0.25 , 0.36 , 0.24 , 0.34 , 0.22 , 0.27 , 0.26 , 0.23 , 0.28 , 0.24 , 0.23 , 0.36 , 0.23 , 0.35 , 0.21 , 0.25 , 0.26 , 0.23 , 0.28 , 0.24 , 0.23 , 0.09 , 0.16 , 0.16 , 0.14 , 0.18 , 0.18 , 0.18 , 0.12 , 0.22 , 0.23 , 0.09 , 0.17 , 0.15 , 0.13 , 0.17 , 0.19 , 0.17 , 0.11)
X <- factor(c(rep(c(rep("B1",10),rep("B2",10)),8)))
DATA=data.frame(Y,X,Z,Fc,Gp)
p <- qplot(X, Y, data=DATA, geom="boxplot", fill=Z, na.rm = TRUE, 
                    outlier.size = NA, outlier.colour = NA)  +
          facet_grid(Gp ~ Fc)+ theme_light()+scale_colour_gdocs()+
          theme(legend.position="bottom") + 
          stat_summary(fun.y=mean, geom="point", shape=23, position = position_dodge(width = .75))

I have:

enter image description here

And the expected plot I want:

enter image description here

I tried this

p + stat_summary(fun.y=mean, geom="line", aes(group = factor(Z)))

and this

p + stat_summary(fun.y=mean, geom="line", aes(group = factor(X)))

but none of the above worked. Instead, I received the following error message:

geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?

Thanks for your help !


Solution

  • You can try a tidyverse solution as well:

    library(tidyverse)
    DATA %>% 
       ggplot() + 
       geom_boxplot(aes(X, Y, fill=Z)) +
       stat_summary(aes(X, Y,fill=Z),fun.y = mean, geom = "point",
                    position=position_nudge(x=c(-0.185,0.185))) +
       geom_segment(data=. %>%
                      group_by(X, Z, Gp , Fc) %>% 
                      summarise(M=mean(Y)) %>% 
                      ungroup() %>% 
                      mutate(Z=paste0("C",Z)) %>% 
                      spread(Z, M), aes(x = as.numeric(X)-0.185, y = C100, 
                        xend = as.numeric(X)+0.185, yend = C50)) +
       facet_grid(Gp ~ Fc)
    

    enter image description here

    The idea is the same as in the answer of d.b.. Create a data.frame for the geom_segment call. the advantage is the dplyr workflow. So everything is done in one run.

    DATA %>% 
      group_by(X, Z, Gp , Fc) %>% 
      summarise(M=mean(Y)) %>% 
      ungroup() %>% 
      mutate(Z=paste0("C",Z)) %>% 
      spread(Z, M) 
    # A tibble: 8 x 5
           X     Gp     Fc  C100   C50
    * <fctr> <fctr> <fctr> <dbl> <dbl>
    1     B1     G1    FC1 0.169 0.281
    2     B1     G1    FC2 0.170 0.294
    3     B1     G2    FC1 0.193 0.270
    4     B1     G2    FC2 0.168 0.269
    5     B2     G1    FC1 0.171 0.276
    6     B2     G1    FC2 0.161 0.292
    7     B2     G2    FC1 0.188 0.269
    8     B2     G2    FC2 0.163 0.264
    

    Or you can try a slighlty different approach compared to Julius' answer. Add breaks and labels to get the expected output and play around with some offset on a numeric X2 and the width parameter within the boxplot function to get the boxes plotted together.

    DATA %>% 
      mutate(X2=as.numeric(interaction(Z, X))) %>% 
      mutate(X2=ifelse(Z==100, X2 + 0.2, X2 - 0.2)) %>% 
      ggplot(aes(X2, Y, fill=Z, group=X2)) + 
       geom_boxplot(width=0.6) +
       stat_summary(fun.y = mean, geom = "point") +
       stat_summary(aes(group = X),fun.y = mean, geom = "line") +
       facet_grid(Gp ~ Fc) +
       scale_x_continuous(breaks = c(1.5,3.5), labels = c("B1","B2"),
                            minor_breaks = NULL, limits=c(0.5,4.5))
    

    enter image description here