Search code examples
rggplot2time-seriesmeanline-plot

How to plot the grand mean in ggplot


I am trying to plot 35 individual time series data (102 data points each) using ggplot and geom_line. I'd also like to overlap the grand mean of the individual data across time as a second geom_line that is either a different color or different alpha.

Here is a sample from my data:

> dput(head(mdata, 10))
structure(list(Individual = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), Signal = c(-0.132894911, -0.13, 0, 0, 0, 0.02, 0.01, 
0.01, 0, 0.02), Time = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 
0.8, 0.9)), row.names = c(NA, 10L), class = "data.frame")

I've done this before with summarySE, however, it is no longer compatible the current version of R. I've tried to use two separate data frames (one with the individual data and one with the mean data) and overlay those data but I think because I've melted the individual data (from 35x102 data frame to a 3x3570), I am getting an error that says:

"Aesthetics must be either length 1 or the same as the data (102): group".

Then, I've tried using stat_summary and fun.data but I am still getting errors that says:

Error: geom_line requires the following missing aesthetics: y

ggplot(data=mdata,aes(x=Time, y=Signal, group=Individual, ymin=-1, ymax=3))+ 
  geom_line()+
  stat_summary(fun.data="mean", geom="line", color = "red")

Here is a dropbox link to the example data frame and graph I need as an output.

Any advice would be greatly appreciated! I've seen similar problems elsewhere, but I think the fact I am grouping my data within the aesthetic is causing me problems.


Solution

  • You can add a layer geom_line() from the summary data frame.

    # Let's create the summary using `dplyr'
    library(dplyr)
    avg_group <- mdata %>% 
      select(Individual, Signal, Time) %>%
      group_by(Individual) %>% 
      summarise(avg_ind = mean(Time), avg_sig = mean(Signal))
    # -------------------------------------------------------------------------
    # > avg_group
    # # A tibble: 35 x 3
    # Individual avg_ind avg_sig
    # <int>   <dbl>   <dbl>
    # 1          1    5.05  0.107 
    # 2          2    5.05  0.0947
    # 3          3    5.05  0.0781
    # 4          4    5.05  0.0362
    # 5          5    5.05  0.0156
    # 6          6    5.05  0.0182
    # 7          7    5.05  0.774 
    # 8          8    5.05  0.297 
    # 9          9    5.05  0.517 
    # 10         10    5.05  0.685 
    # # … with 25 more rows
    # -------------------------------------------------------------------------
    # Then plot the graph using 
    ggplot(mdata,aes(x=Time, y=Signal, group=Individual, ymin=-1, ymax=3))+ 
      geom_line() + 
      geom_line(data = avg_group, aes(avg_ind, avg_sig), group = 1, color = "red") + theme_bw()
    # -------------------------------------------------------------------------
    

    Output

    avg_time_signal

    If you prefer stat_summary() what you can do is to add an explicit variable common to the dataframe and use that as a grouping aesthetic. You can do that as follows:

    # > head(mdata, 2)
    # Individual     Signal Time
    # 1          1 -0.1328949  0.0
    # 2          1 -0.1300000  0.1
    # ------------------------------------------------------------------------
    mdata$grand <- 1 
    
    # > head(mdata, 2)
    # Individual     Signal Time grand
    # 1          1 -0.1328949  0.0     1
    # 2          1 -0.1300000  0.1     1
    # ------------------------------------------------------------------------
    # plot using grand as an explicit variable used to group the plot
    ggplot(mdata,aes(x=Time, y=Signal, group=Individual, ymin=-1, ymax=3))+ 
      geom_line() + stat_summary(aes(group = grand), fun.y="mean", geom="line", color = "red") + theme_bw()
    

    Output

    output_stat_summary

    To make something like the output you expect (as shown in the link you shared),

    ggplot(data=mdata,aes(x=Time, y=Signal, group=Individual, ymin=-1, ymax=3))+ 
      geom_line()+ 
      geom_rect(xmin = (mean(mdata$Time) + se(mdata$Time)) , xmax =xmin + 0.4, fill = "red", ymax = -0.94, ymin = -1) + theme_bw()
    

    There is a warning to this output as all is not coming from the data, though the grand mean and standard error are used to plot the rectangle.

    Output

    output_geom_rec

    You may refer here for the se function.