rdplyrpurrr

Mutate across multiple columns while indexing with a list column


I am trying to compute a mean across multiple columns v1 and v2 while subsetting on the index contained within another list column k. Here is a toy example.

library(dplyr)
set.seed(1452)

df <- tibble(id = 1:10,
             v1 = rnorm(10, 3, 10),
             v2 = runif(10, -5, 5),
             k = list(c(1,2,3), c(2,4,10), c(2,4,6), c(6,8,9), 
                      c(3,5,7), c(1,8,10), c(2,3,5), c(1,4,9),
                      c(6,7,10), c(3,6,8))
             )

df %>% 
  mutate(across(v1:v2, ~mean(.x[k]), .names = "{.col}_mean"))

# An alternative that works on individual columns, but not on a set as 
in across()
df %>%
  mutate(v1_mean = purrr::map_dbl(k, ~ mean(v1[.x])),
         v2_mean = purrr::map_dbl(k, ~ mean(v2[.x])))

So, for instance, I need the first value of v1_mean column to be the average of v1 for rows 1, 2 and 3. This is just a toy example, but in the real dataset I have many more columns that may or may not change depending on user input, making individual commands unfeasible.


Solution

  • In base R:

    cols <- grep("v", colnames(df), value = TRUE)
    df[cols] <- sapply(df[cols], \(x){
      sapply(df[["k"]], \(y){
        mean(x[y])
      })
    })
    
    df[cols]
    #               v1         v2
    #  [1,] -2.0045776 -2.9066536
    #  [2,]  6.9007768 -0.8388712
    #  [3,]  0.7163411 -2.0565709
    #  [4,] -8.8294785 -0.1457720
    #  [5,] -2.5560838  0.1835374
    #  [6,] -0.1574661  0.9343264
    #  [7,] -1.7014777 -2.3139597
    #  [8,]  7.5872896 -1.5040707
    #  [9,] -8.4803950  2.5870195
    # [10,] -8.1981119 -1.4194570