Search code examples
rdplyr

R: How to average a column between specific values in another?


I have a potentially unique problem where I would like to average some values in a given column by using the mutate() function in dplyr, but only between those rows that contain specific strings in another column.

For example, I have two subjects (Person 1 and Person 2) and three different types of tests (test column) they performed. Test 1 and Test 2 are the same test but performed on different limbs, so I would like to group by each subject (name column) and average the values in the error column, but only between rows that contain Test 1 and Test 2. error values for Test 3 I'd like left untouched for each subject. Hope that makes sense, and any help will be appreciated.

set.seed(1)
dat <- data.frame(
  name = rep(paste("Person", 1:2), each = 30),
  test = rep(paste("Test", 1:3), each = 10),
  value = round(rnorm(60, 100, 20)),
  error = round(runif(60, 2, 15), 1)
  )

Below is the first 3 rows by group of the above data. For the output, the number of rows should be the same, but the error values based on the above conditions aggregated. For example, when looking at Person 1 rows 1 and 4 should be the average of rows 1 and 4 (first Test 1 observations), row 2 and 5 should be the average of rows 2 and 5 and so on. Test 3 is left untouched.

# A tibble: 18 × 4
# Groups:   name, test [6]
   name     test   value error
   <chr>    <chr>  <dbl> <dbl>
 1 Person 1 Test 1    87  14.9
 2 Person 1 Test 1   104   8.4
 3 Person 1 Test 1    83   8.3
 4 Person 1 Test 2   130   9.5
 5 Person 1 Test 2   108   3  
 6 Person 1 Test 2    88   2.5
 7 Person 1 Test 3   118  10.9
 8 Person 1 Test 3   116   9.8
 9 Person 1 Test 3   101   5.1
10 Person 2 Test 1   127  10  
11 Person 2 Test 1    98   9.2
12 Person 2 Test 1   108   6.3
13 Person 2 Test 2    97   5.7
14 Person 2 Test 2    95  13.6
15 Person 2 Test 2   114   7.8
16 Person 2 Test 3   108  10.2
17 Person 2 Test 3    88  12.9
18 Person 2 Test 3   107  13.1

Solution

  • Assuming you want to maintain the value and error columns e.g. keep all rows:

    library(dplyr)
    library(slider)
    
    result <- dat |>
      group_by(name, test) |>
      mutate(
        error_roll_mean = ifelse(
          test %in% c("Test 1", "Test 2"),
          case_when(
            row_number() <= 3 ~ NA,
            .default = slide_dbl(error, mean, .before = 3, .complete = TRUE)
          ),
          error
        )
      ) |>
      ungroup()
    
    data.frame(result)
    #        name   test value error error_roll_mean
    # 1  Person 1 Test 1    87  14.9              NA
    # 2  Person 1 Test 1   104   8.4              NA
    # 3  Person 1 Test 1    83   8.3              NA
    # 4  Person 1 Test 1   132   4.3           8.975
    # 5  Person 1 Test 1   107  11.8           8.200
    # 6  Person 1 Test 1    84   7.9           8.075
    # 7  Person 1 Test 1   110   8.6           8.150
    # 8  Person 1 Test 1   115   4.7           8.250
    # 9  Person 1 Test 1   112   5.0           6.550
    # 10 Person 1 Test 1    94   9.7           7.000
    # 11 Person 1 Test 2   130   9.5              NA
    # 12 Person 1 Test 2   108   3.0              NA
    # 13 Person 1 Test 2    88   2.5              NA
    # 14 Person 1 Test 2    56  10.4           6.350
    # 15 Person 1 Test 2   122  14.1           7.500
    # 16 Person 1 Test 2    99   9.8           9.200
    # 17 Person 1 Test 2   100   9.3          10.900
    # 18 Person 1 Test 2   119   8.8          10.500
    # 19 Person 1 Test 2   116  14.8          10.675
    # 20 Person 1 Test 2   112   8.6          10.375
    # 21 Person 1 Test 3   118  10.9          10.900
    # 22 Person 1 Test 3   116   9.8           9.800
    # 23 Person 1 Test 3   101   5.1           5.100
    # 24 Person 1 Test 3    60   5.4           5.400
    # 25 Person 1 Test 3   112  11.5          11.500
    # 26 Person 1 Test 3    99   7.9           7.900
    # 27 Person 1 Test 3    97   4.3           4.300
    # 28 Person 1 Test 3    71  11.7          11.700
    # 29 Person 1 Test 3    90   3.4           3.400
    # 30 Person 1 Test 3   108  13.2          13.200
    # 31 Person 2 Test 1   127  10.0              NA
    # 32 Person 2 Test 1    98   9.2              NA
    # 33 Person 2 Test 1   108   6.3              NA
    # 34 Person 2 Test 1    99   7.9           8.350
    # 35 Person 2 Test 1    72   8.5           7.975
    # 36 Person 2 Test 1    92   4.4           6.775
    # 37 Person 2 Test 1    92   8.9           7.425
    # 38 Person 2 Test 1    99   3.0           6.200
    # 39 Person 2 Test 1   122   5.6           5.475
    # 40 Person 2 Test 1   115   4.8           5.575
    # 41 Person 2 Test 2    97   5.7              NA
    # 42 Person 2 Test 2    95  13.6              NA
    # 43 Person 2 Test 2   114   7.8              NA
    # 44 Person 2 Test 2   111  12.1           9.800
    # 45 Person 2 Test 2    86  13.4          11.725
    # 46 Person 2 Test 2    86   7.4          10.175
    # 47 Person 2 Test 2   107   2.8           8.925
    # 48 Person 2 Test 2   115   6.4           7.500
    # 49 Person 2 Test 2    98  11.4           7.000
    # 50 Person 2 Test 2   118   6.4           6.750
    # 51 Person 2 Test 3   108  10.2          10.200
    # 52 Person 2 Test 3    88  12.9          12.900
    # 53 Person 2 Test 3   107  13.1          13.100
    # 54 Person 2 Test 3    77   7.1           7.100
    # 55 Person 2 Test 3   129   6.9           6.900
    # 56 Person 2 Test 3   140  13.6          13.600
    # 57 Person 2 Test 3    93  10.4          10.400
    # 58 Person 2 Test 3    79  11.6          11.600
    # 59 Person 2 Test 3   111   9.9           9.900
    # 60 Person 2 Test 3    97  13.7          13.700