Search code examples
rtidyeval

How can I vary part of a formula argument when wrapping models in tidyeval?


As far as I have seen there are two ways of dealing with formula arguments when you want to make functions that wrap models. You can paste string versions of formula together:

library(tidyverse)
run_model1 <- function(df, dep_str, ...){
  groupers <- enquos(...)
  formula <- dep_str %>% str_c("~ cty") %>% as.formula()
  df %>%
    group_by(!!!groupers) %>%
    do(model = lm(formula, data = .))
}

or you can quote the whole formula:

run_model2 <- function(df, formula, ...){
  groupers <- enquos(...)
  formula <- enexpr(formula)
  df %>%
    group_by(!!!groupers) %>%
    do(model = lm(!!formula, data = .))
}

both of which do in fact allow me to get grouped models while varying a variable in the formula.

run_model1(mpg, "hwy", cyl)
#> Source: local data frame [4 x 2]
#> Groups: <by row>
#> 
#> # A tibble: 4 x 2
#>     cyl model   
#> * <int> <list>  
#> 1     4 <S3: lm>
#> 2     5 <S3: lm>
#> 3     6 <S3: lm>
#> 4     8 <S3: lm>
run_model2(mpg, hwy ~ cty, cyl)
#> Source: local data frame [4 x 2]
#> Groups: <by row>
#> 
#> # A tibble: 4 x 2
#>     cyl model   
#> * <int> <list>  
#> 1     4 <S3: lm>
#> 2     5 <S3: lm>
#> 3     6 <S3: lm>
#> 4     8 <S3: lm>

However, the first requires an awkward mixing of quoted and unquoted arguments and especially does not work well if I want to access the symbol version for use later. The second forces me to supply the entire formula every time, when I'd rather only supply one part.

Basically, how can I get a function that would take arguments like this?

run_model3(mpg, hwy, cyl)

Solution

  • ensym() should let you capture a symbol provided to the function.

    ensym() and ensyms() are variants of enexpr() and enexprs() that check the captured expression is either a string (which they convert to symbol) or a symbol. If anything else is supplied they throw an error.

    source

    run_model3 <- function (df, dep_str, ...) {
      dep_str <- ensym(dep_str)
      groupers <- enquos(...)
      formula <- dep_str %>% str_c("~ cty") %>% as.formula()
      df %>%
        group_by(!!!groupers) %>%
        do(model = lm(formula, data = .))
    }
    

    > run_model3(mpg, hwy, cyl)
    Source: local data frame [4 x 2]
    Groups: <by row>
    
    # A tibble: 4 x 2
        cyl model   
    * <int> <list>  
    1     4 <S3: lm>
    2     5 <S3: lm>
    3     6 <S3: lm>
    4     8 <S3: lm>
    

    And based off the quote before we can even use the current method of run_model1:

    > run_model3(mpg, "hwy", cyl)
    Source: local data frame [4 x 2]
    Groups: <by row>
    
    # A tibble: 4 x 2
        cyl model   
    * <int> <list>  
    1     4 <S3: lm>
    2     5 <S3: lm>
    3     6 <S3: lm>
    4     8 <S3: lm>