Search code examples
rfiltersmoothingrolling-computationrolling-average

r calculating rolling average with window based on value (not number of rows or date/time variable)


I'm quite new to all the packages meant for calculating rolling averages in R and I hope you can show me in the right direction.

I have the following data as an example:

ms <- c(300, 300, 300, 301, 303, 305, 305, 306, 308, 310, 310, 311, 312,
    314, 315, 315, 316, 316, 316, 317, 318, 320, 320, 321, 322, 324,
    328, 329, 330, 330, 330, 332, 332, 334, 334, 335, 335, 336, 336,
    337, 338, 338, 338, 340, 340, 341, 342, 342, 342, 342)
correct <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
         1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
         1, 0, 0, 1, 0, 0, 1, 1, 0, 0)
df <- data.frame(ms, correct)

ms are time points in milliseconds and correct is whether a specific action is performed correctly
(1 = correct, 0 = not correct).

My goal now is that I'd like to calculate the percentage correct (or average) over windows of a set number of milliseconds. As you can see, certain time points are missing and certain time points occur multiple times. I, therefore, do not want to do a filter based on row number. I've looked into some packages such as "tidyquant" but it seems to me that these kind of packages need a time/date variable instead of a numerical variable to determine the window over which values are averaged. Is there a way to specify the window on the numerical value of df$ms?


Solution

  • Try out:

    library(dplyr)
    
    # count the number of values per ms
    df <- df %>%
            group_by(ms) %>%
            mutate(Nb.values = n())
    
    # consider a window of 1 ms and compute the percentage for each window
    df2 <- setNames(aggregate(correct ~ factor(df$ms, levels = as.character(seq(min(df$ms), max(df$ms), 1))),
                              df, sum),
                    c("ms", "Count.correct"))
    
    # complete data frame (including unused levels)
    df2 <- tidyr::complete(df2, ms)
    df2$ms <- as.numeric(levels(df2$ms))[df2$ms]
    df2 <- df2 %>% left_join(distinct(df[, c(1, 3)]), "ms")
    
    # compute a rolling mean of the percentage of correct, with a width of 5
    df2 %>%
            mutate(Window = paste(ms, ms+4, sep = "-"), # add windows
                   Rolling.correct = zoo::rollapply(Count.correct, 5, sum, na.rm = T,
                                                    partial = TRUE, fill = NA, align = "left") /
                           zoo::rollapply(Nb.values, 5, sum, na.rm = T, partial = TRUE,
                                          fill = NA, align = "left")) # add rolling mean
    
    # A tibble: 43 x 5
          ms Count.correct Nb.values  Window Rolling.correct
       <dbl>         <dbl>     <int>   <chr>           <dbl>
     1   300             2         3 300-304            0.40
     2   301             0         1 301-305            0.00
     3   302            NA        NA 302-306            0.25
     4   303             0         1 303-307            0.25
     5   304            NA        NA 304-308            0.25
     6   305             0         2 305-309            0.25
     7   306             1         1 306-310            0.25
     8   307            NA        NA 307-311            0.00
     9   308             0         1 308-312            0.20
    10   309            NA        NA 309-313            0.25
    # ... with 33 more rows