I'm trying to do a slide window mean in a dataframe with fixed number of elements.
I'm usint tidyverse
and the slider
package.
lets use this example
> example
# A tibble: 28 × 3
# Groups: a [4]
a b order
<chr> <dbl> <dbl>
1 a 0.199 1
2 a 0.711 2
3 a 0.506 3
4 a 0.0233 4
5 a 0.994 5
6 a 0.0920 6
7 a 0.0633 7
8 b 0.208 1
9 b 0.536 2
10 b 0.954 3
# … with 18 more rows
I want to walk in this dataframe collecting fixed 5 rows of a same group (column a
), calculate the mean of the values in the column b
, and get a final dataframe with the mean of each consecutive 5 rows.
I'd like to do something less verbose than
example %>%
filter(a=='a') %>%
filter(order >= 1 & order <= 5) %>%
summarise(
bmean = mean(b)
) %>%
bind_rows(
example %>%
filter(a=='a') %>%
filter(order >= 2 & order <= 6) %>%
summarise(
bmean = mean(b)
)
) %>%
bind_rows(
example %>%
filter(a=='a') %>%
filter(order >= 3 & order <= 7) %>%
summarise(
bmean = mean(b)
)
)
....
In the end I'd like to have something like this
# A tibble: 3 × 2
a bmean
<chr> <dbl>
1 a 0.487
2 a 0.465
3 a 0.336
dput
of the example
structure(list(a = c("a", "a", "a", "a", "a", "a", "a", "b",
"b", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c", "c", "c",
"d", "d", "d", "d", "d", "d", "d"), b = c(0.199394755531102,
0.711478831479326, 0.505893802503124, 0.0233104680664837, 0.994404575554654,
0.092042422387749, 0.0632751111406833, 0.208315970376134, 0.535682428395376,
0.95367618277669, 0.960666978731751, 0.498914737952873, 0.930273110279813,
0.729706238722429, 0.248024080879986, 0.411097032949328, 0.235628247493878,
0.801356019219384, 0.921010897262022, 0.0789776453748345, 0.887602533679456,
0.137188882334158, 0.454170317854732, 0.92619909462519, 0.556802915409207,
0.396265675779432, 0.345948834205046, 0.904526130529121), order = c(1,
2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1,
2, 3, 4, 5, 6, 7)), class = c("grouped_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -28L), groups = structure(list(
a = c("a", "b", "c", "d"), .rows = structure(list(1:7, 8:14,
15:21, 22:28), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE))
Thanks in advance
As the data is already grouped, we don't need the group_by
, just use slide
on the 'b' column with .after = 4
library(dplyr)
library(slider)
example %>%
summarise(bmean = slide_dbl(b, .f = mean, .after = 4), .groups = 'drop')
-output
# A tibble: 28 × 2
a bmean
<chr> <dbl>
1 a 0.487
2 a 0.465
3 a 0.336
4 a 0.293
5 a 0.383
6 a 0.0777
7 a 0.0633
8 b 0.631
9 b 0.776
10 b 0.815
# … with 18 more rows
If we need to get the mean only when there are 5 elements, specify .complete = TRUE
to return as NA and later remove the NA elements (na.omit
)
example %>%
summarise(bmean = slide_dbl(b, .f = mean, .after = 4,
.complete = TRUE), .groups = 'drop') %>%
na.omit()
-output
# A tibble: 12 × 2
a bmean
<chr> <dbl>
1 a 0.487
2 a 0.465
3 a 0.336
4 b 0.631
5 b 0.776
6 b 0.815
7 c 0.523
8 c 0.490
9 c 0.585
10 d 0.494
11 d 0.536
12 d 0.626