Search code examples
rggplot2facetsummarystat

issue with using stat_summary to produce error bars for line graphs when faceting


I'm trying to add error bars to line graphs using stat_summary in ggplot2 but it won't work when I facet the graphs

My data:

    date week year location imidacloprid block wickhami virescens sexta
1 15-May    1 2015  kinston           tp     1        0         0     0
2 15-May    1 2015  kinston           gh     1        0         0     0
3 15-May    1 2015  kinston          utc     1        0         0     0
4 15-May    1 2015  kinston           gh     2        0         0     0
5 15-May    1 2015  kinston          utc     2        0         0     0
6 15-May    1 2015  kinston           tp     2        0         0     0


'data.frame':   576 obs. of  9 variables:
 $ date        : Factor w/ 27 levels "1-Jul","12-Jun",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ week        : Factor w/ 12 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ year        : Factor w/ 2 levels "2015","2016": 1 1 1 1 1 1 1 1 1 1 ...
 $ location    : Factor w/ 2 levels "kinston","rocky mount": 1 1 1 1 1 1 1 1 1 1 ...
 $ imidacloprid: Factor w/ 3 levels "gh","tp","utc": 2 1 3 1 3 2 3 2 1 2 ...
 $ block       : Factor w/ 4 levels "1","2","3","4": 1 1 1 2 2 2 3 3 3 4 ...
 $ wickhami    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ virescens   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ sexta       : num  0 0 0 0 0 0 0 0 0 0 ...

Summarizing the data for graphing:

wickhami_sum = summarySE(bug_subset_final, 
                measurevar="wickhami", 
                groupvars=c("imidacloprid","week","year"))

   imidacloprid week year N wickhami         sd          se         ci
1            gh    1 2015 8   0.0000  0.0000000  0.00000000  0.0000000
2            gh    1 2016 8   0.0000  0.0000000  0.00000000  0.0000000
3            gh    2 2015 8   0.0000  0.0000000  0.00000000  0.0000000
4            gh    2 2016 8   0.0000  0.0000000  0.00000000  0.0000000
5            gh    3 2015 8   0.0000  0.0000000  0.00000000  0.0000000
6            gh    3 2016 8   0.1250  0.2314550  0.08183171  0.1935012
7            gh    4 2015 8   0.0000  0.0000000  0.00000000  0.0000000
8            gh    4 2016 8   0.5000  0.4629100  0.16366342  0.3870025
9            gh    5 2015 8   0.5000  0.3779645  0.13363062  0.3159862

The code below gives me no issue and produces a line graph of the two years of my data combined, and produces error bars via stat_summary

ggplot(wickhami_sum, aes(x=week, y=wickhami,linetype=imidacloprid,group=imidacloprid))+
  stat_summary(fun.data=mean_se,geom="errorbar",width=.2,color="black",position=position_dodge(0.2))+
  stat_summary(fun.y=mean,geom="line",position=position_dodge(0.2))

However, when I try and facet the data by year (as below), I can't get stat_summary to produce error bars and get the error message below

ggplot(wickhami_sum, aes(x=week, y=wickhami,linetype=imidacloprid,group=imidacloprid))+
  stat_summary(fun.y=mean,geom="line",position=position_dodge(0.2))+facet_grid(year~.)+
  stat_summary(fun.data=mean_se,geom="errorbar",width=.2,color="black",position=position_dodge(0.2))

Warning message:
Removed 72 rows containing missing values (geom_errorbar). 

I've tried expanding the range/limits of the y axis to include the error bars but I still get the same warning message and no error bars. I'm hoping to use stat_summary to produce the error bars for the faceted graphs and not have to calculate standard errors again. Any help is appreciated in understanding why faceting isn't allow stat_summary to function properly, or what I'm not doing correctly.


Solution

  • Here's what I think is happening: There are two rows of data per week in the unfacetted plot, but only one row per week in each panel of the facetted plot, causing the standard error calculation to return NA. stat_summary is intended for unsummarized data and does the data summaries internally. Use bug_subset_final with stat_summary, or switch to geom_errorbar to continue using wickhami_sum. Details below.

    You've pre-summarized the data, but stat_summary is intended to work on the raw data and calculate the summary values internally. In the summary data frame wickhami_sum that you've passed to ggplot, there are two rows per week, one for each week of 2015 and one for each week of 2016. All of the data by week and year has been collapsed down to a single row for each week and year by the summary operation.

    Thus, in the unfacetted plot, there are two rows of data for stat_summary to operate on for each week. But in the facetted plot, it's trying to calculate a standard error from a single observation, which is probably returning NA, hence nothing gets plotted. Even in the unfacetted plot, your error bars are being calculated from the two mean values for each year, which isn't what you want either.

    Instead, either continue to use wickhami_sum, but instead of stat_summary do:

    geom_errorbar(aes(ymin = wickhami - se, ymax=wickhami + se))
    

    Or, use the raw data (which looks like it's called bug_subset_final) with stat_summary:

    ggplot(bug_subset_final, aes(x=week, y=wickhami)) +      
      stat_summary(fun.data=mean_se, geom="errorbar)`.