Search code examples
rdplyrmean

Rollmean over one month with different groups in R


I have following extract of my dataset:

library(dyplr)
library(runner)
example <- data.frame(Date <- c("2020-03-24", "2020-04-06" ,"2020-04-08" ,
                                "2020-04-13", "2020-04-14", "2020-04-15",
                                "2020-04-16", "2020-04-18", "2020-04-23",
                                "2020-04-24", "2020-04-26", "2020-04-29",
                                "2020-03-24", "2020-04-06" ,"2020-04-08" ,
                                "2020-04-01", "2020-04-12", "2020-04-15",
                                "2020-04-17", "2020-04-18", "2020-04-22",
                                "2020-05-01", "2020-05-15", "2020-05-29",
                                "2020-03-08", "2020-04-06" ,"2020-04-15",
                                "2020-04-22", "2020-04-28", "2020-05-05",
                                "2020-05-08", "2020-05-22", "2020-05-23"),
                      username <- c("steves_" ,"steves_" ,"steves_", 
                                    "steves_" ,"steves_" ,"steves_", 
                                    "steves_" ,"steves_" ,"steves_",
                                    "steves_" ,"steves_" ,"steves_",
                                    "jules_" ,"jules_" ,"jules_", 
                                    "jules_" ,"jules_" ,"jules_", 
                                    "jules_" ,"jules_" ,"jules_", 
                                    "jules_" ,"jules_" ,"jules_", 
                                    "mia" ,"mia" ,"mia", 
                                    "mia" ,"mia" ,"mia", 
                                    "mia" ,"mia" ,"mia"),
                      ER <- as.numeric(c("0.092", "0.08", "0.028",
                                         "0.1", "0.09", "0.02", 
                                         "0.02", "0.8", "0.001", 
                                         "0.001", "0.1", "0.098", 
                                         "0.001", "0.002","0.02", 
                                         "0.0098", "0.002","0.0019",
                                         "0.002", "0.11","0.002", 
                                         "0.02", "0.01", "0.009", 
                                         "0.19", "0.09", "0.21",
                                          "0.22", "0.19", "0.22",
                                         "0.09", "0.19", "0.28")))
                    
colnames(example) <- c("Date", "username", "ER")

example$Date <- as.Date(example$Date)
str(example)

I would like to calculate the respective average of the ER over a month from the respective dates. I know that there are similar contributions to this already in the forum - but unfortunately I could not find the solution for me.

I have tried the following solutions:

example$avgER_30days <- example %>% 
  arrange(username, Date) %>% 
  group_by(username) %>% 
  mutate(rollmean(example$ER, Date > (Date %m-% months(1)) & Date < Date, fill = NA))

or with the package runners

example$average <- example %>%
  group_by(username) %>%
  arrange(username, Date) %>%
  mutate(mean_run(x = example$ER, k = 30, lag = 1, idx=example$Date)) %>%
  ungroup(username)

I would be happy if you could help me!


Solution

  • Here are two equivalent alternatives.

    In the first alternative below, the second argument to rollapplyr is a list such that the ith component is the vector of offsets to average over for the ith row of the group.

    In the second alternative we can specify the width as a vector of widths, one per row, and then when taking the mean eliminate the last value.

    Note that w is slightly different in the two alternatives.

    Review ?rollapply for details on the arguments and for further examples.

    library(dplyr, exclude = c("filter", "lag"))
    library(zoo)
    
    example %>% 
      arrange(username, Date) %>% 
      group_by(username) %>% 
      mutate(w = seq_along(Date) - findInterval(Date - 30, Date) - 1, 
        avg30 = rollapplyr(ER, lapply(-w, seq, to = -1), mean, fill=NA)) %>%
      ungroup
    
    example %>% 
      arrange(username, Date) %>% 
      group_by(username) %>% 
      mutate(w = seq_along(Date) - findInterval(Date - 30, Date), 
        avg30 = rollapplyr(ER, w, function(x) mean(head(x, -1)), fill = NA)) %>%
      ungroup