Search code examples
rdplyrzoorollapply

Evaluate rolling mean by group when groups have insufficient observations


I am trying to calculate a bunch of rolling means for a dataset -

     date name px_last 
2012-12-04   A  6.81  
2012-12-05   A  4.28  
2012-12-06   A  4.32
2012-12-04   A  6.89  
2012-12-05   A  7.24    
2012-12-04   B  6.81  
2012-12-05   B  9.38  
2012-12-06   B  3.62
2012-12-04   C  3.85  
2012-12-05   C  4.23  
2012-12-06   C  7.32
2012-12-04   C  2.84  
2012-12-05   C  8.38

The is the code I am using -

vec = c(2, 3, 4, 5)
for(i in c(1:length(vec))){
  varname = paste0("ma", vec[i])
  df = df %>% group_by(name) %>% mutate(!!varname := rollapplyr(px_last, vec[i], mean, na.rm = TRUE, fill = "NA"))
}

The error I get from the third iteration is -

Error in mutate_impl(.data, dots) : 
Evaluation error: wrong sign in 'by' argument.

From this post - Clueless about this error: wrong sign in 'by' argument - I understand that is because some groups have insufficient rows; in this example, group with name = B

How do I got about calculating the means without deleting the groups with insufficient observations? I would like NAs preferably in places where the mean cannot be calculated due to insufficient observations.


Solution

  • @phil_t So the issue is that the rollapply is attempting to apply the mean in a right-aligned fashion. When this is attempted on a window of size 4 on group B, which only has 3 values it correctly fails.

    So you will need to "regularize" the DF to have the required number of rows for each group.

    library(plyr)
    > df <- ddply(df, 'name', function(x) {
                     rbind.fill(x, data.frame(name = rep(unique(x$name), each = max(vec) - nrow(x))))
                  })
    
    > df
      date name px_last
      2012-12-04    A    6.81
      2012-12-05    A    4.28
      2012-12-06    A    4.32
      2012-12-04    A    6.89
      2012-12-05    A    7.24
      2012-12-04    B    6.81
      2012-12-05    B    9.38
      2012-12-06    B    3.62
      <NA>    B      NA
      <NA>    B      NA
      2012-12-04    C    3.85
      2012-12-05    C    4.23
      2012-12-06    C    7.32
      2012-12-04    C    2.84
      2012-12-05    C    8.38
    
    > for(i in c(1:length(vec))){
        varname = paste0("ma", vec[i])
        df = df %>% group_by(name) %>% mutate(!!varname := rollapplyr(px_last, vec[i], mean, na.rm = TRUE, fill = "NA"))
      }
    

    That should help.