Search code examples
rggplot2facet

ggplot: stat_summary for mean with facet


Within ggplot2, I am using the stat_summary() function to calculate and plot the mean and standard deviation of a dataset. I am simultaneously using facet_wrap() to break the dataset into two plots. I was pleasantly surprised that adding facet_wrap() to my ggplot caused stat_summary() to correctly be applied to each subset of the data independently.

df>
| ID        | Group | Strain | Condition | DoublingTime    |
|-----------|-------|--------|-----------|-----------------|
| A_3g_Rep1 | A_3g  | A      | 3g        |     122.4135    |
| A_3g_Rep2 | A_3g  | A      | 3g        |     124.5801    |
| A_3g_Rep3 | A_3g  | A      | 3g        |     124.9419    |
| A_6g_Rep1 | A_6g  | A      | 6g        |     120.5004    |
| A_6g_Rep2 | A_6g  | A      | 6g        |     124.1666    |
| A_6g_Rep3 | A_6g  | A      | 6g        |     124.6453    |
| B_3g_Rep1 | B_3g  | B      | 3g        |     132.568     |
| B_3g_Rep2 | B_3g  | B      | 3g        |     137.5242    |
| B_3g_Rep3 | B_3g  | B      | 3g        |     135.5238    |
| B_6g_Rep1 | B_6g  | B      | 6g        |     137.1333    |
| B_6g_Rep2 | B_6g  | B      | 6g        |     142.733     |
| B_6g_Rep3 | B_6g  | B      | 6g        |     140.0722    |

First, I was using the following which correctly calculates mean and standard deviation values. However, it includes groups on the x-axis aren't present in the facet.

DT_plotA <- ggplot(df, aes(Group, DoublingTime)) +
  stat_summary(fun.data="mean_sdl", fun.args = list(mult=1),
               geom="errorbar", width=0.5) +
  stat_summary(fun=mean, geom="point", size=3) +
  facet_wrap(nrow = 1, .~Strain)

DT_plotA

I was pleasantly surprised that adjusting my aes() x-value to Condition while including facet_wrap() caused stat_summary() to correctly calculate mean and standard deviation for each Group correctly.

DT_plotB <- ggplot(df, aes(Condition, DoublingTime)) +
  stat_summary(fun.data="mean_sdl", fun.args = list(mult=1),
               geom="errorbar", width=0.5) +
  stat_summary(fun=mean, geom="point", size=3) +
  facet_wrap(nrow = 1, .~Strain)

DT_plotB

However, if facet_wrap is removed from the plot, stat_summary calculates mean and standard deviation based on Condition: data from independent Strains is averaged. I worry that this caveat will be forgotten and lead to incorrect calculation of mean/sd when facet is removed.

DT_plotC <- ggplot(df, aes(Condition, DoublingTime)) +
  stat_summary(fun.data="mean_sdl", fun.args = list(mult=1),
               geom="errorbar", width=0.5) +
  stat_summary(fun=mean, geom="point", size=3)

DT_plotC

Question
Is there a way to generate a plot that looks like DT_plotB but instead includes aes(Group, DoublingTime) as in shown in the code for DT_plotA?


Solution

  • Maybe we could do it with some preprocessing of the data -> calculating the mean and sd:

    library(dplyr)
    library(ggplot2)
    
    df %>% 
      group_by(Strain, Condition) %>% 
      mutate(mean = mean(DoublingTime),
             sd = sd(DoublingTime)) %>% 
      ggplot(aes(x = Condition, y=mean)) +
      geom_point()+
      geom_errorbar(aes(ymin = mean-sd, ymax = mean+sd), width=.2)+
      facet_wrap(.~Strain)
    

    enter image description here