My problem is the following. I have a function foo
which works inside dplyr::mutate
. This function accepts tidyselect
syntax. I want to build a wrapper function bar
which should also support tidyselect
syntax. I am looking for a clean way to pass the tidyselect
ed columns from bar
to foo
. Sounds easy, but the problem is that foo
needs to accept bare user input which will be quoted and it also need to accept already quoted columns which come from the wrapper function.
So lets have a look at the problem:
library(dplyr)
myiris <- as_tibble(iris)
# this is a minimal function supporting tidyselect
# its a toy function, which just returns the tidyselected columns
foo <- function(cols){
data <- cur_data()
vars <- tidyselect::eval_select(rlang::enquo(cols), data)
out <- data[, vars]
names(out) <- paste0("new_", names(out))
out
}
# the function is working:
myiris %>%
mutate(foo(c(Sepal.Length)))
#> # A tibble: 150 x 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_Sepal.Length
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.1
#> 2 4.9 3 1.4 0.2 setosa 4.9
#> 3 4.7 3.2 1.3 0.2 setosa 4.7
#> 4 4.6 3.1 1.5 0.2 setosa 4.6
#> 5 5 3.6 1.4 0.2 setosa 5
#> 6 5.4 3.9 1.7 0.4 setosa 5.4
#> 7 4.6 3.4 1.4 0.3 setosa 4.6
#> 8 5 3.4 1.5 0.2 setosa 5
#> 9 4.4 2.9 1.4 0.2 setosa 4.4
#> 10 4.9 3.1 1.5 0.1 setosa 4.9
#> # … with 140 more rows
# this is a wrapper function around `foo`
bar <- function(df, .cols) {
.cols <- rlang::enquo(.cols)
mutate(df, foo(.cols))
}
# this will throw an error
myiris %>%
bar(Sepal.Length)
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(.cols)` instead of `.cols` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> Error: Problem with `mutate()` input `..1`.
#> x Must subset columns with a valid subscript vector.
#> x Subscript has the wrong type `quosure/formula`.
#> ℹ It must be numeric or character.
#> ℹ Input `..1` is `foo(.cols)`.
Created on 2021-04-14 by the reprex package (v0.3.0)
It makes total sense that the above doesn't work. Not obvious to me is how to handle this problem in a clean and consistent way.
Below I show what I have tried and what kind of mediocre workaround I came up with.
What I thought I could do is: to check if the columns are already quoted, and if not enquote
them. However, this does not seem to be possible. Once the unquoted columns are used for any kind of operation they will be evaluated and change. The enquo
has to happen as first thing. But if it happens first, I can't check if they have been quoted already.
# we would need to check in foo
# if cols is already quoted or not
# but this seems not to be possible
# since `cols` changes, once it is used / touched
foo <- function(cols){
data <- cur_data()
if (!rlang::is_quosure(cols)) {
cols <- enquo(cols)
}
vars <- tidyselect::eval_select(cols, data)
out <- data[, vars]
names(out) <- paste0("new_", names(out))
out
}
# not working
iris %>%
mutate(foo(c(Sepal.Length)))
#> Error: Problem with `mutate()` input `..1`.
#> x Must subset columns with a valid subscript vector.
#> x Can't convert from <double> to <integer> due to loss of precision.
#> ℹ Input `..1` is `foo(c(Sepal.Length))`.
Created on 2021-04-14 by the reprex package (v0.3.0)
At the moment I am using a workaround that I don't like very much. I use the ellipsis ...
in foo
so that I can call it with an additional argument which does not need to be documented. Now foo
can be called with a flag
argument and in that case foo
knows that the columns don't have to be quoted.
However, I don't think this is a clean solution. I would prefer some kind of function which quotes if not already quoted, or a function which restores the environment of the columns names when they are passed to bar
.
One other possible solution would be to first evaluate the columns in bar
and then paste the column name as strings to foo
. I haven't tried that, it should work since tidyselect accepts strings, however I would like to avoid evaluating the column names in bar
as it doesn't seem very performant.
Any other ideas welcome.
# workaround using `...`
foo <- function(cols, ...){
dots <- rlang::list2(...)
if (is.null(dots$flag)) {
cols <- enquo(cols)
}
data <- cur_data()
vars <- tidyselect::eval_select(cols, data)
out <- data[, vars]
names(out) <- paste0("new_", names(out))
out
}
bar <- function(df, .cols) {
.cols <- rlang::enquo(.cols)
mutate(df, foo(.cols, flag = TRUE))
}
# working
myiris %>%
mutate(foo(c(Sepal.Length)))
#> # A tibble: 150 x 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_Sepal.Length
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.1
#> 2 4.9 3 1.4 0.2 setosa 4.9
#> 3 4.7 3.2 1.3 0.2 setosa 4.7
#> 4 4.6 3.1 1.5 0.2 setosa 4.6
#> 5 5 3.6 1.4 0.2 setosa 5
#> 6 5.4 3.9 1.7 0.4 setosa 5.4
#> 7 4.6 3.4 1.4 0.3 setosa 4.6
#> 8 5 3.4 1.5 0.2 setosa 5
#> 9 4.4 2.9 1.4 0.2 setosa 4.4
#> 10 4.9 3.1 1.5 0.1 setosa 4.9
#> # … with 140 more rows
# working
myiris %>%
bar(Sepal.Length)
#> # A tibble: 150 x 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_Sepal.Length
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 5.1
#> 2 4.9 3 1.4 0.2 setosa 4.9
#> 3 4.7 3.2 1.3 0.2 setosa 4.7
#> 4 4.6 3.1 1.5 0.2 setosa 4.6
#> 5 5 3.6 1.4 0.2 setosa 5
#> 6 5.4 3.9 1.7 0.4 setosa 5.4
#> 7 4.6 3.4 1.4 0.3 setosa 4.6
#> 8 5 3.4 1.5 0.2 setosa 5
#> 9 4.4 2.9 1.4 0.2 setosa 4.4
#> 10 4.9 3.1 1.5 0.1 setosa 4.9
#> # … with 140 more rows
Created on 2021-04-14 by the reprex package (v0.3.0)
Maybe I don't understand the use case, but why do the columns have to be quoted when you pass them from bar()
to foo()
? If you unquote the input, everything works as intended:
bar <- function(df, .cols) {
.cols <- rlang::enquo(.cols)
mutate(df, foo(!!.cols)) # <--- unquote before passing to foo()
}
# Or alternatively
bar <- function(df, .cols) {mutate(df, foo( {{.cols}} ))}
myiris %>%
bar(Sepal.Length) # works