Search code examples
rdataframedplyrmeancumulative-sum

How to get the cummean of the last n rows before the last row using dplyr


If I have a data frame that I am going through using dplyr, how can I get the cummean of the last 2 rows before the last row?

For example...

 ------
   |value|
   -------
   | 1   |
   ------
   | 2   |
   -------
   | 3   |
   ------
   | 4   |
   -------

In this case, when the code goes through the value 4, I want the cummean to calculate (2+3)/2.

I am thinking something along the lines of this--

new_df -> df %>%
    summarise(
       roll_mean = cummean(value,-2)
    )

However I know this does not work


Solution

  • [Using this data: df <- data.frame(value = 1:4)]

    For two prior values, you could do it manually like so:

    df %>% 
      mutate(roll_mean = (lag(value) + lag(value,2))/2)
    
    #  value roll_mean
    #1     1        NA
    #2     2        NA
    #3     3       1.5
    #4     4       2.5
    

    Or if the window might be larger, it could be easier to use slider:slide_*, a nice set of window functions. Here we ask for the mean for a window that starts two elements prior (.before = 2) and ends one element prior (after = -1).

    df %>%
      mutate(roll_mean = slider::slide_dbl(value, mean, .before = 2, .after = -1))
    
    #  value roll_mean
    #1     1       NaN
    #2     2       1.0
    #3     3       1.5
    #4     4       2.5
    

    By default slider allows incomplete windows; if we want the same output we could use:

    df %>%
      mutate(roll_mean = slider::slide_dbl(value, mean, .before = 2, .after = -1, .complete = TRUE))
    
    #  value roll_mean
    #1     1        NA
    #2     2        NA
    #3     3       1.5
    #4     4       2.5