Search code examples
rzoo

Work out rolling sums for variables with non-consecutive days in a dataframe in R


I have some data on which I'd like to work out the win % across a rolling 14 day window thoughout around 7 years worth of results. The days are non-consecutive so whenever I group by the 'Trainer' variable and run rollapplyr or runSum/ sum_run, I get the past 14 events summarised but can't figure out how to get the 14 days grouped. When I try to define the width or k value using the day from the date, I get an error

invalid time series parameters specified

or vec' must be sorted non-decreasingly and not contain NAs

Edit- Below code gives the error above

df %>% group_by(Trainer) %>% mutate(Fourteen_day_wins =             rollapplyr(Wins, width = 1:n() - findInterval( Date %d-% Days(14), Date), sum)) %>%  ungroup

I'd like to get the total wins and count of events for a 14 day rolling period when grouped by Trainer in new columns in my df. Could someone point me in the right direction please? Still an R novice so has stumped me so far!

Sample df:

structure(list(Trainer = c("Appleby, Charlie", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J",  "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J",  "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J"), Wins = c(1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0,  0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0,  0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0,  1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,  1, 0, 1, 0, 0), Date = structure(c(1508025600, 1508112000, 1508112000,  1508112000, 1508198400, 1508284800, 1508284800, 1508284800, 1508457600,  1508457600, 1508544000, 1508544000, 1508544000, 1508716800, 1508716800,  1508716800, 1508803200, 1508803200, 1508803200, 1508889600, 1508889600,  1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1509062400,  1509062400, 1509062400, 1509062400, 1509062400, 1509148800, 1509148800,  1509148800, 1509148800, 1509148800, 1509148800, 1509321600, 1509321600,  1509321600, 1509321600, 1509494400, 1509667200, 1509667200, 1509753600,  1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600,  1510099200, 1510099200, 1510099200, 1510358400, 1510358400, 1510358400,  1521936000, 1521936000, 1523923200, 1523923200, 1523923200, 1524009600,  1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600,  1524009600, 1524009600, 1524009600, 1524096000, 1524096000, 1524096000,  1524096000, 1524096000, 1524096000, 1524096000, 1524182400, 1524182400,  1524182400, 1524268800, 1524268800, 1524268800, 1524528000, 1524528000,  1524528000, 1524528000, 1524614400, 1524614400, 1524614400, 1524787200,  1524787200, 1524787200, 1524787200, 1524787200, 1525132800, 1525219200,  1525219200, 1525219200), tzone = "UTC", class = c("POSIXct",  "POSIXt"))), row.names = c(NA, -101L), class = c("tbl_df", "tbl",  "data.frame"))

Solution

  • The problem is that the arguments to findInterval should be numeric and ordered.

    To address this convert the dates to Date class and then numeric so that d below is the number of days since the Epoch. Now we can use it with findInterval as shown. If the data were already sorted the arrange line could be omitted.

    library(dplyr, exclude = c("filter", "lag"))
    library(zoo)
    
    DF %>%
      arrange(Trainer, Date) %>%
      group_by(Trainer) %>%
      mutate(d = as.numeric(as.Date(Date)), 
             Wins14 = rollapplyr(Wins, 1:n() - findInterval(d - 14, d), sum)) %>%
      ungroup %>%
      select(-d)
    

    giving:

    # A tibble: 101 x 4
       Trainer           Wins Date                Wins14
       <chr>            <dbl> <dttm>               <dbl>
     1 Appleby, Charlie     1 2017-10-15 00:00:00      1
     2 Appleby, Charlie     1 2017-10-18 00:00:00      2
     3 Appleby, Charlie     0 2017-10-18 00:00:00      2
     4 Appleby, Charlie     0 2017-10-23 00:00:00      2
     5 Appleby, Charlie     1 2017-10-25 00:00:00      3
     6 Appleby, Charlie     0 2017-10-25 00:00:00      3
     7 Appleby, Charlie     0 2017-10-25 00:00:00      3
     8 Appleby, Charlie     1 2017-10-25 00:00:00      4
     9 Appleby, Charlie     0 2017-10-27 00:00:00      4
    10 Appleby, Charlie     0 2017-10-27 00:00:00      4
    # ... with 91 more rows