I am trying to compute a mean across multiple columns v1
and v2
while subsetting on the index contained within another list column k
. Here is a toy example.
library(dplyr)
set.seed(1452)
df <- tibble(id = 1:10,
v1 = rnorm(10, 3, 10),
v2 = runif(10, -5, 5),
k = list(c(1,2,3), c(2,4,10), c(2,4,6), c(6,8,9),
c(3,5,7), c(1,8,10), c(2,3,5), c(1,4,9),
c(6,7,10), c(3,6,8))
)
df %>%
mutate(across(v1:v2, ~mean(.x[k]), .names = "{.col}_mean"))
# An alternative that works on individual columns, but not on a set as
in across()
df %>%
mutate(v1_mean = purrr::map_dbl(k, ~ mean(v1[.x])),
v2_mean = purrr::map_dbl(k, ~ mean(v2[.x])))
So, for instance, I need the first value of v1_mean
column to be the average of v1
for rows 1, 2 and 3. This is just a toy example, but in the real dataset I have many more columns that may or may not change depending on user input, making individual commands unfeasible.
In base R:
cols <- grep("v", colnames(df), value = TRUE)
df[cols] <- sapply(df[cols], \(x){
sapply(df[["k"]], \(y){
mean(x[y])
})
})
df[cols]
# v1 v2
# [1,] -2.0045776 -2.9066536
# [2,] 6.9007768 -0.8388712
# [3,] 0.7163411 -2.0565709
# [4,] -8.8294785 -0.1457720
# [5,] -2.5560838 0.1835374
# [6,] -0.1574661 0.9343264
# [7,] -1.7014777 -2.3139597
# [8,] 7.5872896 -1.5040707
# [9,] -8.4803950 2.5870195
# [10,] -8.1981119 -1.4194570