Search code examples
rsliderrolling-average

Rolling average that skips a predetermined number of rows


Good day/evening all,

Hope you are well. I would be grateful if someone could give me some hints about how to solve this problem.

I have variable x with dates attached. I would like to create a rolling average of x, but I would like the average to skip the most recent ten dates, and instead take the average of the ten dates prior to that. The average should also be grouped by a categorical variable.

Here is a minimally reproducible example:

set.seed(24)
dat <- data.frame(id=1:93,
      date = rep(seq(as.Date("1990/1/1"),as.Date("2020/1/1"),"years"),each=3),
      group = rep(LETTERS[1:3],31),
      x=rnorm(93))

For example, the new variable, rolling, for 2020, group A (id = 91), should be the average from 2001 through 2010, but only from A years. (Rolling averages for 1990-1999 don’t matter in this example.)

I’ve gotten as far as the grouped rolling average, but I haven’t figured out how to “skip” yet. I think I probably just don’t know the right vocabulary. The slider package has worked best for me so far. This code just makes a rolling average of the past ten dates:

library(dplyr)
library(slider)
newdat <- dat %>% group_by(group) %>%
  mutate(rolling = slide_dbl(x, mean, .after = 9))

I am at your disposal for any further information. Thank you for reading and for any tips!


Solution

  • In this case we want the window to include items from between 19 and 10 (inclusive) observations prior, so we can set .before = 19, .after = -10 to reference that range.

    Help for slider::slide_*:

    .before, .after
    ⁠[integer(1) / Inf]⁠

    The number of values before or after the current element to include in the sliding window. Set to Inf to select all elements before or after the current element. Negative values are allowed, which allows you to "look forward" from the current element if used as the .before value, or "look backwards" if used as .after.

    (Btw, if you might have missing or inconsistently spaced dates, you might also consider the slide_index_dbl function to make the time window explicit.)

    dat %>%
      group_by(group) %>%
      mutate(rolling = slider::slide_dbl(x, mean, .before = 19, .after = -10)) %>%
      filter(id == 91)
    
    
    # A tibble: 1 × 5
    # Groups:   group [1]
         id date       group      x rolling
      <int> <date>     <chr>  <dbl>   <dbl>
    1    91 2020-01-01 A     -0.233  -0.388
    

    Check

    dat %>%
      filter(group == "A", 
             date >= as.Date("2001-01-01"),
             date <= as.Date("2010-12-31")) %>%
      summarize(rolling = mean(x))
    
         rolling
    1 -0.3883791