Is it possible to have conditional statements operate on different parts of dplyr::summarize()?
Imagine I am working with the iris
data and outputting a summary and I want to only include the mean of Sepal.Length when requested. So I could do something like:
data(iris)
include_length = T
if (include_length) {
iris %>%
group_by(Species) %>%
summarize(mean_sepal_width = mean(Sepal.Width), mean_sepal_length = mean(Sepal.Length))
} else {
iris %>%
group_by(Species) %>%
summarize(mean_sepal_width = mean(Sepal.Width))
}
But is there a way to implement the conditional within the pipeline so that it does not need to be duplicated?
You can use the .dots
parameter of dplyr's SE functions to evauluate programmatically, e.g.
library(dplyr)
take_means <- function(include_length){
iris %>%
group_by(Species) %>%
summarize_(mean_sepal_width = ~mean(Sepal.Width),
.dots = if(include_length){
list(mean_sepal_length = ~mean(Sepal.Length))
})
}
take_means(TRUE)
#> # A tibble: 3 × 3
#> Species mean_sepal_width mean_sepal_length
#> <fctr> <dbl> <dbl>
#> 1 setosa 3.428 5.006
#> 2 versicolor 2.770 5.936
#> 3 virginica 2.974 6.588
take_means(FALSE)
#> # A tibble: 3 × 2
#> Species mean_sepal_width
#> <fctr> <dbl>
#> 1 setosa 3.428
#> 2 versicolor 2.770
#> 3 virginica 2.974