Search code examples
rdplyrrlangtidyeval

Passing variable names from formula into tidy function with rlang


I've been reading through the tidy evaluation book, several questions, and a few other resources and I still can't find a straight answer to this.

Suppose I have a formula object in R, say x ~ y + z, and a function f, that I define like the following.

f <- function(.data, a_formula) { .data %>% select(SOMETHING HERE) }

I want evaluating f(df, a_formula) to give the same result as df %>% select(x, y, z) without having to manually specify the formula components.

I know rlang has a few functions to deal with formulas, and there is all.vars() from base R, but I am not sure what I need to put in the internal select() function within f() for this to evaluate how I want. Is the best answer to just do basic string manipulations until I get what I want? I would prefer if there was some way for the tidy evaluation stuff to handle this behind the scenes, so, for example, f(df, formula(x ~ .)) would result in f %>% select(x, everything()) as usual for dplyr.

So basically what I am asking is, is there a way for me to extract the variables from a formula and have select() evaluate them as normal?


Solution

  • You've basically already found all the parts you need, you just need to put them together with the all_of() helper function. For example

    library(dplyr)
    
    f <- function(.data, a_formula) { 
      .data %>% select(all_of(all.vars(a_formula))) 
    }
    
    # test with built-in iris data frame
    f(iris, Sepal.Length ~ Sepal.Width)
    

    If you want to do something special for ., then just add that in

    f <- function(.data, a_formula) { 
       if ("." %in% all.vars(a_formula)) return(.data)
      .data %>% select(all_of(all.vars(a_formula))) 
    }
    f(iris, Sepal.Length ~ .)