I want to calculate moving averages for overlapping generations. E.g. the value of the year 1915 should include the mean of the years 1900-1930, the value of the year 1916 the mean of the years 1901-1931 and so on. I wrote the following function and loop below:
calc_mean = function(data_frame, yr, time_generation){
df_MM = data_frame %>%
filter(yr >= year & yr < year + time_generation) %>%
summarize(school_mean = mean(school, na.rm = TRUE)) %>%
mutate(year = year + gen_interval/2)
return(df_MM)
}
time_generation = 30;
# Preallocation
df_mean = data.frame()
for(year in seq(from = 1900, to = 1960, by = 1)){
df_MM = calc_mean(df_school, yr = year, time_generation)
df_mean = rbind(df_mean, df_MM)
}
remove(df_MM)
However, if I crosscheck it for a small sample I get wrong values. Do you see my error?
Let me give you a small sample to check on your own:
set.seed(2)
df_school <- data.frame(year = 1900:1960, val = sort(runif(61)))
Assuming you have no gaps in your data,
set.seed(42)
x <- data.frame(year = 2000:2010, val = sort(runif(11)))
x$rollavg <- zoo::rollmean(x$val, k=3, fill=NA, align="center")
x$rollavg2 <- zoo::rollapply(x$val, FUN=mean, width=3, align="center", partial=TRUE)
x
# year val rollavg rollavg2
# 1 2000 0.1346666 NA 0.2104031
# 2 2001 0.2861395 0.2928493 0.2928493
# 3 2002 0.4577418 0.4209924 0.4209924
# 4 2003 0.5190959 0.5395277 0.5395277
# 5 2004 0.6417455 0.6059446 0.6059446
# 6 2005 0.6569923 0.6679342 0.6679342
# 7 2006 0.7050648 0.6995485 0.6995485
# 8 2007 0.7365883 0.7573669 0.7573669
# 9 2008 0.8304476 0.8272807 0.8272807
# 10 2009 0.9148060 0.8941097 0.8941097
# 11 2010 0.9370754 NA 0.9259407
where rollavg
is the standard rolling-mean that does not provide a stat when too-few data is available. rollavg2
is provided if you want incomplete averages.