I want to apply a conditional rowMeans to each group of IDs using dplyr. The reproducible table follows:
dat <- as.Date("2021/08/04")
len <- 4
seq(dat, by = "day", length.out = len)
input <- data.frame(
date = c(seq(dat, by = "day", length.out = len) , seq(dat, by = "day", length.out = len)),
user_id = c(rep("aa",4),rep("bb",4)),
var1 = c(1:4),
var2 = c(4,7,23,9,0,0,0,0),
var3 = c(0,0,0,0,4,8,2,7)
)
The logic is: For each ID, first select the columns which are numeric and their sum is =!0, and then compute the rowMean of the selected columns for that ID.
The desired output table follows:
output = data.frame(
date = c(seq(dat, by = "day", length.out = len) , seq(dat, by = "day", length.out = len)),
user_id = c(rep("aa",4),rep("bb",4)),
var1 = c(1:4),
var2 = c(4,7,23,9,0,0,0,0),
var3 = c(0,0,0,0,4,8,2,7),
rowAverage = as.numeric(c(rowMeans(input[1:4,3:4]), rowMeans(input[5:8,c(3,5)])))
)
The lines that I came-up with follow, but I get some errors:
output = input %>%
dplyr::group_by(user_id) %>%
dplyr::mutate(rowAverage = rowMeans(select_if(function(x) {(is.numeric(x)) && (sum(x)=!0)})))
Could you please help me correct this? Thank you.
You’re very close. You just need to supply the data to select_if()
, too.
Use cur_data()
for that.
input %>%
group_by(user_id) %>%
mutate(rowAverage = rowMeans(
select_if(cur_data(), \(x) is.numeric(x) && sum(x) != 0)
))
#> # A tibble: 8 × 6
#> # Groups: user_id [2]
#> date user_id var1 var2 var3 rowAverage
#> <date> <chr> <int> <dbl> <dbl> <dbl>
#> 1 2021-08-04 aa 1 4 0 2.5
#> 2 2021-08-05 aa 2 7 0 4.5
#> 3 2021-08-06 aa 3 23 0 13
#> 4 2021-08-07 aa 4 9 0 6.5
#> 5 2021-08-04 bb 1 0 4 2.5
#> 6 2021-08-05 bb 2 0 8 5
#> 7 2021-08-06 bb 3 0 2 2.5
#> 8 2021-08-07 bb 4 0 7 5.5