Search code examples
rtidyverserlangquasiquotes

using quasiquotation in functions with formula interface


I want to write a custom function that can take bare and "string" inputs, and can handle both functions with and without the formula interface.

custom function example

# setup
set.seed(123)
library(tidyverse)

# custom function
foo <- function(data, x, y) {
  # function without formula
  print(table(data %>% dplyr::pull({{ x }}), data %>% dplyr::pull({{ y }})))

  # function with formula
  print(
    broom::tidy(stats::t.test(
      formula = rlang::new_formula({{ rlang::ensym(y) }}, {{ rlang::ensym(x) }}),
      data = data
    ))
  )
}

bare

works for both functions with and without formula interface

foo(mtcars, am, cyl)
#>    
#>      4  6  8
#>   0  3  4 12
#>   1  8  3  2

#> # A tibble: 1 x 10
#>   estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
#>      <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl>
#> 1     1.87      6.95      5.08      3.35 0.00246      25.9    0.724      3.02
#> # ... with 2 more variables: method <chr>, alternative <chr>

string

works for both functions with and without formula interface

foo(mtcars, "am", "cyl")
#>    
#>      4  6  8
#>   0  3  4 12
#>   1  8  3  2

#> # A tibble: 1 x 10
#>   estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
#>      <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl>
#> 1     1.87      6.95      5.08      3.35 0.00246      25.9    0.724      3.02
#> # ... with 2 more variables: method <chr>, alternative <chr>

as colnames

works only for functions without the formula interface

foo(mtcars, colnames(mtcars)[9], colnames(mtcars)[2])
#>    
#>      4  6  8
#>   0  3  4 12
#>   1  8  3  2

#> Error: Only strings can be converted to symbols
#> Backtrace:
#>     x
#>  1. \-global::foo(mtcars, colnames(mtcars)[9], colnames(mtcars)[2])
#>  2.   +-base::print(...)
#>  3.   +-broom::tidy(...)
#>  4.   +-stats::t.test(...)
#>  5.   +-rlang::new_formula(...)
#>  6.   \-rlang::ensym(y)

How can I modify the original function so that it will work with all the above-mentioned ways of entering the inputs and for both kinds of functions used?


Solution

  • The nice philosophy of rlang is that you get to control when you want values to be evaluated via the !! and {{}} operators. You seem to want to make a function that takes strings, symbols, and (possibly evaluated) expressions all in the same parameter. Using symbols or bare strings is actually easy with ensym but also wanting to allow for code like colnames(mtcars)[9] that has to be evaulated before returning a string is the problem. This potentially can be quite confusing. For example, what's the behavior you expect when you run the following?

    am <- 'disp'
    cyl <- 'gear'
    foo(mtcars, am, cyl)
    

    You could write a helper function if you want to assume all "calls" should be evaluated but symbols and literals should not. Here's a "cleaner" function

    clean_quo <- function(x) {
      if (rlang::quo_is_call(x)) {
        x <- rlang::eval_tidy(x)
      } else if (!rlang::quo_is_symbolic(x)) {
        x <- rlang::quo_get_expr(x)
      }
      if (is.character(x)) x <- rlang::sym(x)
      if (!rlang::is_quosure(x)) x <- rlang::new_quosure(x)
      x
    }
    

    and you could use that in your function with

    foo <- function(data, x, y) {
      x <- clean_quo(rlang::enquo(x))
      y <- clean_quo(rlang::enquo(y))
    
      # function without formula
      print(table(data %>% dplyr::pull(!!x), data %>% dplyr::pull(!!y)))
    
      # function with formula
      print(
        broom::tidy(stats::t.test(
          formula = rlang::new_formula(rlang::quo_get_expr(y), rlang::quo_get_expr(x)),
          data = data
        ))
      )
    }
    

    Doing so will allow all these to return the same values

    foo(mtcars, am, cyl)
    foo(mtcars, "am", "cyl")
    foo(mtcars, colnames(mtcars)[9], colnames(mtcars)[2])
    

    But you are probably just delaying possible other problems. I would not recommend over-interpreting user intentions with this kind of code. That's why it's better to explicitly allow them to un-escape themselves. Perhaps provide two different versions of the function that can be used with parameter that require evaluation and those that do not.