I want to define a custom function which groups and summarises some data using dplyr, and conditional on a Boolean flag can group by an additional level. I can achieve this using a full if... else control block as in this trivial example:
library(tidyverse)
data(Titanic)
Titanic <- as_tibble(Titanic)
foo <- function(by_age = FALSE) {
if (by_age) {
bar <- Titanic %>%
group_by(Survived, Age)
} else {
bar <- Titanic %>%
group_by(Survived)
}
bar %>%
summarise(n = sum(n))
}
foo()
foo(by_age = TRUE)
But this seems a very clumsy way round. Is there a way I can achieve this with a single block of dplyr code, conditionally calling Age as a second grouping variable? I've tried with ifelse(by_age, Age, NA)
in my group_by
statement, and some of the techniques listed in this SO post but to no avail.
Sorry, I didn't read your linked SO post; if you want to avoid the ...
approach for some reason, this is one potential solution:
library(tidyverse)
data(Titanic)
Titanic <- as_tibble(Titanic)
foo <- function(by_age = FALSE) {
Titanic %>%
group_by(Survived, if(by_age) Age) %>%
summarise(n = sum(n))
}
foo()
#> # A tibble: 2 × 2
#> Survived n
#> <chr> <dbl>
#> 1 No 1490
#> 2 Yes 711
foo(by_age = TRUE)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups: Survived [2]
#> Survived `if (by_age) Age` n
#> <chr> <chr> <dbl>
#> 1 No Adult 1438
#> 2 No Child 52
#> 3 Yes Adult 654
#> 4 Yes Child 57
Created on 2022-07-07 by the reprex package (v2.0.1)
To avoid the "Age" column being called "if (by_age) Age" you can use:
library(tidyverse)
data(Titanic)
Titanic <- as_tibble(Titanic)
foo <- function(by_age = FALSE) {
Titanic %>%
group_by(Survived, !!sym(ifelse(by_age, "Age", ""))) %>%
summarise(n = sum(n))
}
foo()
#> # A tibble: 2 × 2
#> Survived n
#> <chr> <dbl>
#> 1 No 1490
#> 2 Yes 711
foo(by_age = TRUE)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups: Survived [2]
#> Survived Age n
#> <chr> <chr> <dbl>
#> 1 No Adult 1438
#> 2 No Child 52
#> 3 Yes Adult 654
#> 4 Yes Child 57
Created on 2022-07-07 by the reprex package (v2.0.1)
One solution is to use ...
(dot-dot-dot) to pass in the argument if/when you want, e.g.
library(tidyverse)
data(Titanic)
Titanic <- as_tibble(Titanic)
foo <- function(...) {
Titanic %>%
group_by(Survived, ...) %>%
summarise(n = sum(n))
}
foo()
#> # A tibble: 2 × 2
#> Survived n
#> <chr> <dbl>
#> 1 No 1490
#> 2 Yes 711
foo(Age)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups: Survived [2]
#> Survived Age n
#> <chr> <chr> <dbl>
#> 1 No Adult 1438
#> 2 No Child 52
#> 3 Yes Adult 654
#> 4 Yes Child 57
# You can also pass in multiple 'extra' arguments
foo(Age, Sex)
#> `summarise()` has grouped output by 'Survived', 'Age'. You can override using
#> the `.groups` argument.
#> # A tibble: 8 × 4
#> # Groups: Survived, Age [4]
#> Survived Age Sex n
#> <chr> <chr> <chr> <dbl>
#> 1 No Adult Female 109
#> 2 No Adult Male 1329
#> 3 No Child Female 17
#> 4 No Child Male 35
#> 5 Yes Adult Female 316
#> 6 Yes Adult Male 338
#> 7 Yes Child Female 28
#> 8 Yes Child Male 29
Created on 2022-07-07 by the reprex package (v2.0.1)
NB: Using ...
comes with two downsides: