Search code examples
rdplyrzoo

In group descending rolling averages using dplyr


I need to calculate variations of rolling averages by group (group_by) in R dplyr.

The variation is that I want to calculate "descending" rolling averages, which means a rolling average including the 1 to N, then 2 to N, then 3 to N, up to N to N data points within the column of each group.

A simple example:

the_df <- tibble(id=c(rep('x', 11),rep('y',11)), a=rep(10:0,2), b=rep(seq(100,0, ,by=-10),2))

the_df <- the_df %>% 
          group_by(id) %>% 
          mutate(lead_time = row_number(), 
                 inverse_lead_time = 11-lead_time)

the_df %>%
  arrange(id, (lead_time)) %>%
  group_by(id) %>%
  mutate_at(vars(a,b), list(run_avg = ~ zoo::rollmean(., lead_time, fill = NA, align='left')))

The align='left' is included so the rolling average runs to the end of the group.

But, I'm not getting the output I expect.

The output of the last statement is: enter image description here

In this case, I expect the column "a_run_avg" to increase with lead_time.

To reiterate, what I want is at lead_time = 5 (for example) the running average to be the average of column a to include all values a between lead_time = 5 and lead_time = 10.

The rolling average for lead_time=6 would include all values of a between lead_time = 6 and lead_time = 10.

The value of a_run_avg when lead_time = 1 should be 5. Its the average of all values in the a column for the group.

The value of a_run_avg when lead_time = 10 should be 10. I.e., is a rolling mean of only one value.

How can the example code be modified to achieve my descending rolling average?


Solution

  • 1) rollapply rollmean does not accept vector widths but rollapply does.

    library(dplyr)
    library(zoo)
    
    the_df %>%
      mutate(
         across(
          .cols = any_of(c("a", "b")), 
          .fns = ~ rollapply(.x, n():1, mean, align = "left"),
          .names = "avg_{.col}"
         ), .by = id)
    

    giving

    # A tibble: 22 × 5
       id        a     b avg_a avg_b
       <chr> <int> <dbl> <dbl> <dbl>
     1 x        10   100   5      50
     2 x         9    90   4.5    45
     3 x         8    80   4      40
     4 x         7    70   3.5    35
     5 x         6    60   3      30
     6 x         5    50   2.5    25
     7 x         4    40   2      20
     8 x         3    30   1.5    15
     9 x         2    20   1      10
    10 x         1    10   0.5     5
    # ℹ 12 more rows
    

    2) cummean dplyr has a cummean function so this would also work

    library(dplyr)
    
    the_df %>%
      mutate(
         across(
          .cols = any_of(c("a", "b")), 
          .fns = ~ rev(cummean(rev(.x))),
          .names = "avg_{.col}"
         ), .by = id)
    

    Note

    The input used

    library(tibble)
    the_df <- tibble(id=c(rep('x', 11),rep('y',11)),
      a=rep(10:0,2), b=rep(seq(100,0, ,by=-10),2))