Search code examples
rrangeslidertidyversemean

Slide window mean


I'm trying to do a slide window mean in a dataframe with fixed number of elements. I'm usint tidyverse and the slider package. lets use this example

> example
# A tibble: 28 × 3
# Groups:   a [4]
   a          b order
   <chr>  <dbl> <dbl>
 1 a     0.199      1
 2 a     0.711      2
 3 a     0.506      3
 4 a     0.0233     4
 5 a     0.994      5
 6 a     0.0920     6
 7 a     0.0633     7
 8 b     0.208      1
 9 b     0.536      2
10 b     0.954      3
# … with 18 more rows

I want to walk in this dataframe collecting fixed 5 rows of a same group (column a), calculate the mean of the values in the column b, and get a final dataframe with the mean of each consecutive 5 rows. I'd like to do something less verbose than

example %>% 
  filter(a=='a') %>% 
  filter(order >= 1 & order <= 5) %>% 
  summarise(
    bmean = mean(b)
  ) %>% 
  bind_rows(
    example %>% 
      filter(a=='a') %>% 
      filter(order >= 2 & order <= 6) %>% 
      summarise(
        bmean = mean(b)
      ) 
  ) %>% 
  bind_rows(
    example %>% 
      filter(a=='a') %>% 
      filter(order >= 3 & order <= 7) %>% 
      summarise(
        bmean = mean(b)
      )
  )
....

In the end I'd like to have something like this

# A tibble: 3 × 2
  a     bmean
  <chr> <dbl>
1 a     0.487
2 a     0.465
3 a     0.336

dput of the example

structure(list(a = c("a", "a", "a", "a", "a", "a", "a", "b", 
"b", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c", "c", "c", 
"d", "d", "d", "d", "d", "d", "d"), b = c(0.199394755531102, 
0.711478831479326, 0.505893802503124, 0.0233104680664837, 0.994404575554654, 
0.092042422387749, 0.0632751111406833, 0.208315970376134, 0.535682428395376, 
0.95367618277669, 0.960666978731751, 0.498914737952873, 0.930273110279813, 
0.729706238722429, 0.248024080879986, 0.411097032949328, 0.235628247493878, 
0.801356019219384, 0.921010897262022, 0.0789776453748345, 0.887602533679456, 
0.137188882334158, 0.454170317854732, 0.92619909462519, 0.556802915409207, 
0.396265675779432, 0.345948834205046, 0.904526130529121), order = c(1, 
2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 
2, 3, 4, 5, 6, 7)), class = c("grouped_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -28L), groups = structure(list(
    a = c("a", "b", "c", "d"), .rows = structure(list(1:7, 8:14, 
        15:21, 22:28), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE))

Thanks in advance


Solution

  • As the data is already grouped, we don't need the group_by, just use slide on the 'b' column with .after = 4

    library(dplyr)
    library(slider)
    example %>% 
     summarise(bmean = slide_dbl(b, .f = mean, .after = 4), .groups = 'drop')
    

    -output

    # A tibble: 28 × 2
       a      bmean
       <chr>  <dbl>
     1 a     0.487 
     2 a     0.465 
     3 a     0.336 
     4 a     0.293 
     5 a     0.383 
     6 a     0.0777
     7 a     0.0633
     8 b     0.631 
     9 b     0.776 
    10 b     0.815 
    # … with 18 more rows
    

    If we need to get the mean only when there are 5 elements, specify .complete = TRUE to return as NA and later remove the NA elements (na.omit)

    example %>% 
     summarise(bmean = slide_dbl(b, .f = mean, .after = 4,
        .complete = TRUE), .groups = 'drop') %>%
     na.omit()
    

    -output

    # A tibble: 12 × 2
       a     bmean
       <chr> <dbl>
     1 a     0.487
     2 a     0.465
     3 a     0.336
     4 b     0.631
     5 b     0.776
     6 b     0.815
     7 c     0.523
     8 c     0.490
     9 c     0.585
    10 d     0.494
    11 d     0.536
    12 d     0.626