I want to generate dummy indicators for each id for the given categorical variable fruit. I observe the following warning when using summarise_all and self defined function. I also tried to use summarise_all(any) and it gave me warning when coercing double to logical. Is there any efficient or updated way to implement this? Thanks a lot!
fruit = c("apple", "banana", "orange", "pear",
"strawberry", "blueberry", "durian",
"grape", "pineapple")
df_sample = data.frame(id = c(rep("a", 3), rep("b", 5), rep("c", 6)),
fruit = c(sample(fruit, replace = T, size = 3),
sample(fruit, replace = T, size = 5),
sample(fruit, replace = T, size = 6)))
fruit_indicator =
model.matrix(~ -1 + fruit, df_sample) %>%
as.data.frame() %>%
bind_cols(df_sample) %>%
select(-fruit) %>%
group_by(id) %>%
summarise_all(funs(ifelse(any(. > 0), 1, 0)))
# Warning message:
# `funs()` is deprecated as of dplyr 0.8.0.
# Please use a list of either functions or lambdas:
#
# # Simple named list:
# list(mean = mean, median = median)
#
# # Auto named with `tibble::lst()`:
# tibble::lst(mean, median)
#
# # Using lambdas
# list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
You can use across
which is available in dplyr
1.0.0 or higher.
library(dplyr)
model.matrix(~ -1 + fruit, df_sample) %>%
as.data.frame() %>%
bind_cols(df_sample) %>%
select(-fruit) %>%
group_by(id) %>%
summarise(across(.fns = ~as.integer(any(. > 0))))
# id fruitapple fruitbanana fruitdurian fruitgrape fruitpear
#* <chr> <int> <int> <int> <int> <int>
#1 a 0 1 1 0 1
#2 b 1 0 0 1 0
#3 c 0 1 0 1 1
# … with 1 more variable: fruitpineapple <int>