Search code examples
rdplyrrlangtidyselect

Using quoted variables in a custom dplyr wrapper function


My problem is the following. I have a function foo which works inside dplyr::mutate. This function accepts tidyselect syntax. I want to build a wrapper function bar which should also support tidyselect syntax. I am looking for a clean way to pass the tidyselected columns from bar to foo. Sounds easy, but the problem is that foo needs to accept bare user input which will be quoted and it also need to accept already quoted columns which come from the wrapper function.

So lets have a look at the problem:

library(dplyr)

myiris <- as_tibble(iris)

# this is a minimal function supporting tidyselect
# its a toy function, which just returns the tidyselected columns 

foo <- function(cols){
  data <- cur_data()
  vars <- tidyselect::eval_select(rlang::enquo(cols),  data)
  out <- data[, vars]
  
  names(out) <- paste0("new_", names(out))
  out
}

# the function is working:
myiris %>%
  mutate(foo(c(Sepal.Length)))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_Sepal.Length
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>              <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa               5.1
#>  2          4.9         3            1.4         0.2 setosa               4.9
#>  3          4.7         3.2          1.3         0.2 setosa               4.7
#>  4          4.6         3.1          1.5         0.2 setosa               4.6
#>  5          5           3.6          1.4         0.2 setosa               5  
#>  6          5.4         3.9          1.7         0.4 setosa               5.4
#>  7          4.6         3.4          1.4         0.3 setosa               4.6
#>  8          5           3.4          1.5         0.2 setosa               5  
#>  9          4.4         2.9          1.4         0.2 setosa               4.4
#> 10          4.9         3.1          1.5         0.1 setosa               4.9
#> # … with 140 more rows

# this is a wrapper function around `foo`
bar <- function(df, .cols) {
  .cols <- rlang::enquo(.cols)
  mutate(df, foo(.cols))
}

# this will throw an error
myiris %>%
  bar(Sepal.Length)

#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(.cols)` instead of `.cols` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> Error: Problem with `mutate()` input `..1`.
#> x Must subset columns with a valid subscript vector.
#> x Subscript has the wrong type `quosure/formula`.
#> ℹ It must be numeric or character.
#> ℹ Input `..1` is `foo(.cols)`.

Created on 2021-04-14 by the reprex package (v0.3.0)

It makes total sense that the above doesn't work. Not obvious to me is how to handle this problem in a clean and consistent way.

Below I show what I have tried and what kind of mediocre workaround I came up with.

What I thought I could do is: to check if the columns are already quoted, and if not enquote them. However, this does not seem to be possible. Once the unquoted columns are used for any kind of operation they will be evaluated and change. The enquo has to happen as first thing. But if it happens first, I can't check if they have been quoted already.

# we would need to check in foo
# if cols is already quoted or not
# but this seems not to be possible
# since `cols` changes, once it is used / touched

foo <- function(cols){
  data <- cur_data()
  if (!rlang::is_quosure(cols)) {
    cols <- enquo(cols)
  }
  vars <- tidyselect::eval_select(cols, data)
  out <- data[, vars]
  
  names(out) <- paste0("new_", names(out))
  out
}

# not working
iris %>%
  mutate(foo(c(Sepal.Length)))
#> Error: Problem with `mutate()` input `..1`.
#> x Must subset columns with a valid subscript vector.
#> x Can't convert from <double> to <integer> due to loss of precision.
#> ℹ Input `..1` is `foo(c(Sepal.Length))`.

Created on 2021-04-14 by the reprex package (v0.3.0)

At the moment I am using a workaround that I don't like very much. I use the ellipsis ... in foo so that I can call it with an additional argument which does not need to be documented. Now foo can be called with a flag argument and in that case foo knows that the columns don't have to be quoted.

However, I don't think this is a clean solution. I would prefer some kind of function which quotes if not already quoted, or a function which restores the environment of the columns names when they are passed to bar.

One other possible solution would be to first evaluate the columns in bar and then paste the column name as strings to foo. I haven't tried that, it should work since tidyselect accepts strings, however I would like to avoid evaluating the column names in bar as it doesn't seem very performant.

Any other ideas welcome.

# workaround using `...`
foo <- function(cols, ...){
  
  dots <- rlang::list2(...)
  if (is.null(dots$flag)) {
    cols <- enquo(cols)
  }
  
  data <- cur_data()
  vars <- tidyselect::eval_select(cols, data)
  out <- data[, vars]
  
  names(out) <- paste0("new_", names(out))
  out
}

bar <- function(df, .cols) {
  .cols <- rlang::enquo(.cols)
  mutate(df, foo(.cols, flag = TRUE))
}


# working
myiris %>%
  mutate(foo(c(Sepal.Length)))

#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_Sepal.Length
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>              <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa               5.1
#>  2          4.9         3            1.4         0.2 setosa               4.9
#>  3          4.7         3.2          1.3         0.2 setosa               4.7
#>  4          4.6         3.1          1.5         0.2 setosa               4.6
#>  5          5           3.6          1.4         0.2 setosa               5  
#>  6          5.4         3.9          1.7         0.4 setosa               5.4
#>  7          4.6         3.4          1.4         0.3 setosa               4.6
#>  8          5           3.4          1.5         0.2 setosa               5  
#>  9          4.4         2.9          1.4         0.2 setosa               4.4
#> 10          4.9         3.1          1.5         0.1 setosa               4.9
#> # … with 140 more rows

# working
myiris %>%
  bar(Sepal.Length)

#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_Sepal.Length
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>              <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa               5.1
#>  2          4.9         3            1.4         0.2 setosa               4.9
#>  3          4.7         3.2          1.3         0.2 setosa               4.7
#>  4          4.6         3.1          1.5         0.2 setosa               4.6
#>  5          5           3.6          1.4         0.2 setosa               5  
#>  6          5.4         3.9          1.7         0.4 setosa               5.4
#>  7          4.6         3.4          1.4         0.3 setosa               4.6
#>  8          5           3.4          1.5         0.2 setosa               5  
#>  9          4.4         2.9          1.4         0.2 setosa               4.4
#> 10          4.9         3.1          1.5         0.1 setosa               4.9
#> # … with 140 more rows

Created on 2021-04-14 by the reprex package (v0.3.0)


Solution

  • Maybe I don't understand the use case, but why do the columns have to be quoted when you pass them from bar() to foo()? If you unquote the input, everything works as intended:

    bar <- function(df, .cols) {
      .cols <- rlang::enquo(.cols)
      mutate(df, foo(!!.cols))      # <--- unquote before passing to foo()
    }
    
    # Or alternatively
    bar <- function(df, .cols) {mutate(df, foo( {{.cols}} ))}
    
    myiris %>%
      bar(Sepal.Length)             # works