Search code examples
rtime-serieslagrolling-average

How to calculate the rolling mean of the nth previous values


So let's say I have a data frame of a series of values which are assigned to one of two groups ('Gp'):

set.seed(12)
df <- data.frame(id = sample(1:50,50), Gp = sample(2, 50, TRUE)) 

Here are the top 20 values from this:

df
row  id Gp
1   4  1
2  41  1
3  46  1
4  13  1
5   8  2
6   2  2
7  48  2
8  28  2
9   1  2
10 42  2
11 16  2
12 32  1
13 15  2
14 38  2
15 10  1
16 40  1
17 35  1
18 18  2
19 22  1
20 50  2

And what I would like to do is take the rolling average of values 10, 11 and 12 places before the current row, and then subtract the current (row) value from it. I would also like to fill with NA, group by another column ('Gp') and align right.

I've seen some other posts and have been able to do this with dplyr minus the nth previous values part:

df2 <- df %>% group_by(Gp) %>%
  mutate( rm = rollmeanr(id,k = 3, fill = NA)) %>%
  ungroup

But this just goes by the current row and the 2 rows before it. In the example above, what I want to do is, for example for row 15:

row 15 - (row 5 + row 4 + row 3)/3

= 10 - (8 + 12 + 46) / 3 = -12

And I would like it to be placed in a new column appended to the original df with a new name. I would also like to use dplyr if possible.

For this example, this row of the desired output df is:

row id Gp rm
15 10  1 -12

I'm thinking lag() may be helpful, but will need the 3 lagged values together and can see this getting messy without cleaner code.

Similar questions:

Calculating Mean for previous 6 nth data point

Moving mean for the following days not including the current day

Calculate the moving average of nth data points over a time series with NAs r


Solution

  • You could lag the id variable before taking using the rolling mean:

    library(dplyr)
    
    df |> 
      group_by(Gp) |>
      mutate(rm = id - zoo::rollmeanr(lag(id, 10), k = 3, fill = NA)) |>
      ungroup()
    

    Update: Typo + added group_by.