How can I group using a dynamic variable(s) name(s) ?
My example : Try to group on Species
column knowing it is in the grouping_variable
var ?
library(dplyr)
library(dbplyr)
library(DBI)
# My table
iris_table <- tbl(src = my_db_conn, in_schema(schema = "my_schema", table = "iris_table"))
# The grouping variable
grouping_variable <- "Species"
# My tries
iris_table %>%
group_by(across(any_of(grouping_variable))) %>%
summarise(sum_petal_length = sum(Petal.Length))
### ==> returns error
iris_table %>%
group_by(!!!grouping_variable) %>%
summarise(sum_petal_length = sum(Petal.Length))
### ==> returns grouping by the character "Species"
My sessionInfo()
:
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] DBI_1.1.0 dbplyr_1.4.4 dplyr_1.0.0 lubridate_1.7.9
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 rstudioapi_0.11 magrittr_1.5 hms_0.5.3 odbc_1.3.0 tidyselect_1.1.0
[7] bit_1.1-15.2 R6_2.4.1 rlang_0.4.7 fansi_0.4.1 blob_1.2.1 tools_3.6.2
[13] utf8_1.1.4 cli_2.0.2 ellipsis_0.3.1 readxl_1.3.1 bit64_0.9-7.1 assertthat_0.2.1
[19] tibble_3.0.3 lifecycle_0.2.0 crayon_1.3.4 zip_2.0.4 purrr_0.3.4 tidyr_1.1.0
[25] vctrs_0.3.2 glue_1.4.1 openxlsx_4.1.5 stringi_1.4.6 cellranger_1.1.0 compiler_3.6.2
[31] pillar_1.4.6 generics_0.0.2 pkgconfig_2.0.3
Here are two potential approaches.
(1) Most similar to the approach you are already using, we first have to tell R that the character string should be treated as symbolic:
iris_table %>%
group_by(!!!syms(grouping_variable)) %>%
summarise(sum_petal_length = sum(Petal.Length))
Note the syms
before the !!!
. This approach uses some features of the rlang package that can be useful in other contexts. However, it is no longer the recommended approach for programming with dplyr.
(2) The recommended approach for doing this kind of programming with dplyr is:
iris_table %>%
group_by(.data[[grouping_variable]]) %>%
summarise(sum_petal_length = sum(Petal.Length))
Both of these approaches will give you the correct SQL translation when working with dbplyr:
data(iris)
iris_table = tbl_lazy(iris, con = simulate_mssql())
# The grouping variable
grouping_variable <- "Species"
# approach 1
iris_table %>%
group_by(!!!syms(grouping_variable)) %>%
summarise(sum_petal_length = sum(Petal.Length))
# translation from approach 1
# <SQL>
# SELECT `Species`, SUM(`Petal.Length`) AS `sum_petal_length`
# FROM `df`
# GROUP BY `Species`
# approach 2
iris_table %>%
group_by(.data[[grouping_variable]]) %>%
summarise(sum_petal_length = sum(Petal.Length))
# translation from approach 2
# <SQL>
# SELECT `Species`, SUM(`Petal.Length`) AS `sum_petal_length`
# FROM `df`
# GROUP BY `Species`