Search code examples
rdplyrmagrittr

Can different parts of dplyr::summarize() be computed conditionally?


Is it possible to have conditional statements operate on different parts of dplyr::summarize()?

Imagine I am working with the iris data and outputting a summary and I want to only include the mean of Sepal.Length when requested. So I could do something like:

data(iris)
include_length = T
if (include_length) {
  iris %>% 
    group_by(Species) %>%
    summarize(mean_sepal_width = mean(Sepal.Width), mean_sepal_length = mean(Sepal.Length))
} else {
  iris %>% 
    group_by(Species) %>%
    summarize(mean_sepal_width = mean(Sepal.Width))

}

But is there a way to implement the conditional within the pipeline so that it does not need to be duplicated?


Solution

  • You can use the .dots parameter of dplyr's SE functions to evauluate programmatically, e.g.

    library(dplyr)
    
    take_means <- function(include_length){
        iris %>% 
            group_by(Species) %>%
            summarize_(mean_sepal_width = ~mean(Sepal.Width), 
                       .dots = if(include_length){
                           list(mean_sepal_length = ~mean(Sepal.Length))
                       })
    }
    
    take_means(TRUE)
    #> # A tibble: 3 × 3
    #>      Species mean_sepal_width mean_sepal_length
    #>       <fctr>            <dbl>             <dbl>
    #> 1     setosa            3.428             5.006
    #> 2 versicolor            2.770             5.936
    #> 3  virginica            2.974             6.588
    
    take_means(FALSE)
    #> # A tibble: 3 × 2
    #>      Species mean_sepal_width
    #>       <fctr>            <dbl>
    #> 1     setosa            3.428
    #> 2 versicolor            2.770
    #> 3  virginica            2.974