I am trying to calculate a bunch of rolling means for a dataset -
date name px_last
2012-12-04 A 6.81
2012-12-05 A 4.28
2012-12-06 A 4.32
2012-12-04 A 6.89
2012-12-05 A 7.24
2012-12-04 B 6.81
2012-12-05 B 9.38
2012-12-06 B 3.62
2012-12-04 C 3.85
2012-12-05 C 4.23
2012-12-06 C 7.32
2012-12-04 C 2.84
2012-12-05 C 8.38
The is the code I am using -
vec = c(2, 3, 4, 5)
for(i in c(1:length(vec))){
varname = paste0("ma", vec[i])
df = df %>% group_by(name) %>% mutate(!!varname := rollapplyr(px_last, vec[i], mean, na.rm = TRUE, fill = "NA"))
}
The error I get from the third iteration is -
Error in mutate_impl(.data, dots) :
Evaluation error: wrong sign in 'by' argument.
From this post - Clueless about this error: wrong sign in 'by' argument - I understand that is because some groups have insufficient rows; in this example, group with name = B
How do I got about calculating the means without deleting the groups with insufficient observations? I would like NAs preferably in places where the mean cannot be calculated due to insufficient observations.
@phil_t So the issue is that the rollapply
is attempting to apply the mean
in a right-aligned
fashion. When this is attempted on a window of size 4 on group B
, which only has 3 values it correctly fails.
So you will need to "regularize" the DF to have the required number of rows for each group.
library(plyr)
> df <- ddply(df, 'name', function(x) {
rbind.fill(x, data.frame(name = rep(unique(x$name), each = max(vec) - nrow(x))))
})
> df
date name px_last
2012-12-04 A 6.81
2012-12-05 A 4.28
2012-12-06 A 4.32
2012-12-04 A 6.89
2012-12-05 A 7.24
2012-12-04 B 6.81
2012-12-05 B 9.38
2012-12-06 B 3.62
<NA> B NA
<NA> B NA
2012-12-04 C 3.85
2012-12-05 C 4.23
2012-12-06 C 7.32
2012-12-04 C 2.84
2012-12-05 C 8.38
> for(i in c(1:length(vec))){
varname = paste0("ma", vec[i])
df = df %>% group_by(name) %>% mutate(!!varname := rollapplyr(px_last, vec[i], mean, na.rm = TRUE, fill = "NA"))
}
That should help.