Search code examples
rdplyrdata-analysismeanmutate

Mutate in a group by taking mean of only the rows in the group that equal the selected row on another column


I'm running into an issue where I am grouping a data set by one variable (group) and want to take create a new output column that is the mean of the value column within each group AND with the added condition that we only want to consider values that are within 1 of the given row's value for the column day in the calculation for the mean.

So if three rows in the same have day={1,2,3}, the first and third rows will only have means including themselves and the second row, but the mean of the second row includes all three values.

Essentially, I'm wondering how to access the value of day of the SPECIFIC ROW and then use that to take a subset of the group of rows with a value of day close to that number.

So if the input is

df <- read.table(text = "
index    group  day     value
1        A      1       10
2        A      2       20
3        A      3       30
4        B      2       20
5        B      3       30
6        B      5       50", header = TRUE)

The output should look like

index    group  day     value  output
1        A      1       10      15
2        A      2       20      20
3        A      3       30      25
4        B      2       20      25
5        B      3       30      25
6        B      5       50      50

I've tried grouping by group and day, but then including the rows with day=+-1 of a given row in the mean seems to be too computationally intensive to run with the large dataset I am using.


Solution

  • df %>%
      group_by(group) %>%
      mutate(output = slider::slide_index_dbl(value, day, mean, .before = 1, .after = 1)) %>%
      ungroup()
    

    Or with dplyr v1.1.0+, skip the group_by/ungroup and instead add .by = group in the mutate line.

    Result

    # A tibble: 6 × 5
      index group   day value output
      <int> <chr> <int> <int>  <dbl>
    1     1 A         1    10     15
    2     2 A         2    20     20
    3     3 A         3    30     25
    4     4 B         2    20     25
    5     5 B         3    30     25
    6     6 B         5    50     50