Search code examples
rloopsmoving-average

How to calculate moving averages in R?


I want to calculate moving averages for overlapping generations. E.g. the value of the year 1915 should include the mean of the years 1900-1930, the value of the year 1916 the mean of the years 1901-1931 and so on. I wrote the following function and loop below:

calc_mean = function(data_frame, yr, time_generation){
  df_MM = data_frame %>% 
  filter(yr >= year & yr < year + time_generation) %>% 
  summarize(school_mean = mean(school, na.rm = TRUE)) %>% 
  mutate(year = year + gen_interval/2)

return(df_MM)
}
time_generation = 30;

# Preallocation
df_mean = data.frame()


for(year in seq(from = 1900, to = 1960, by = 1)){

  df_MM = calc_mean(df_school, yr = year, time_generation)

  df_mean = rbind(df_mean, df_MM)
}

remove(df_MM)

However, if I crosscheck it for a small sample I get wrong values. Do you see my error?

Let me give you a small sample to check on your own:

set.seed(2)
df_school <- data.frame(year = 1900:1960, val = sort(runif(61)))

Solution

  • Assuming you have no gaps in your data,

    set.seed(42)
    x <- data.frame(year = 2000:2010, val = sort(runif(11)))
    
    x$rollavg <- zoo::rollmean(x$val, k=3, fill=NA, align="center")
    x$rollavg2 <- zoo::rollapply(x$val, FUN=mean, width=3, align="center", partial=TRUE)
    x
    #    year       val   rollavg  rollavg2
    # 1  2000 0.1346666        NA 0.2104031
    # 2  2001 0.2861395 0.2928493 0.2928493
    # 3  2002 0.4577418 0.4209924 0.4209924
    # 4  2003 0.5190959 0.5395277 0.5395277
    # 5  2004 0.6417455 0.6059446 0.6059446
    # 6  2005 0.6569923 0.6679342 0.6679342
    # 7  2006 0.7050648 0.6995485 0.6995485
    # 8  2007 0.7365883 0.7573669 0.7573669
    # 9  2008 0.8304476 0.8272807 0.8272807
    # 10 2009 0.9148060 0.8941097 0.8941097
    # 11 2010 0.9370754        NA 0.9259407
    

    where rollavg is the standard rolling-mean that does not provide a stat when too-few data is available. rollavg2 is provided if you want incomplete averages.