In a previous question I wanted to carry out case_when
with a dynamic number of cases. The solution was to use parse_exprs
along with !!!
. I am looking for a similar solution to mutate/summarise with a dynamic number of columns.
Consider the following dataset.
library(dplyr)
library(rlang)
data(mtcars)
mtcars = mtcars %>%
mutate(g2 = ifelse(gear == 2, 1, 0),
g3 = ifelse(gear == 3, 1, 0),
g4 = ifelse(gear == 4, 1, 0))
Suppose I want to sum the columns g2
, g3
, g4
. If I know these are the columns names then this is simple, standard dplyr:
answer = mtcars %>%
summarise(sum_g2 = sum(g2),
sum_g3 = sum(g3),
sum_g4 = sum(g4))
But suppose I do not know how many columns there are, or their exact names. Instead, I have a vector containing all the column names I care about. Following the logic in the accepted answer of my previous approach I would use:
columns_to_sum = c("g2","g3","g4")
formulas = paste0("sum_",columns_to_sum," = sum(",columns_to_sum,")")
answer = mtcars %>%
summarise(!!!parse_exprs(formulas))
If this did work, then regardless of the column names provided as input in columns_to_sum
, I should receive the sum of the corresponding columns. However, this is not working. Instead of a column named sum_g2
containing sum(g2)
I get a column called "sum_g2 = sum(g2)"
and every value in this column is a zero.
Given that I can pass formulas into case_when
it seems like I should be able to pass formulas into summarise
(and the same idea should also work for mutate
because they all use the rlang package).
In the past there were string versions of mutate
and summarise
(mutate_
and summarise_
) that you could pass formulas to as strings. But these have been retired as the rlang approach is the intended approach now. The related questions I reviewed on Stackoverflow did not use the rlang quotation approach and hence are not sufficient for my purposes.
How do I summarise with a dynamic number of columns (using an rlang approach)?
One option since dplyr 1.0.0
could be:
mtcars %>%
summarise(across(all_of(columns_to_sum), sum, .names = "sum_{col}"))
sum_g2 sum_g3 sum_g4
1 0 15 12