As far as I have seen there are two ways of dealing with formula arguments when you want to make functions that wrap models. You can paste string versions of formula together:
library(tidyverse)
run_model1 <- function(df, dep_str, ...){
groupers <- enquos(...)
formula <- dep_str %>% str_c("~ cty") %>% as.formula()
df %>%
group_by(!!!groupers) %>%
do(model = lm(formula, data = .))
}
or you can quote the whole formula:
run_model2 <- function(df, formula, ...){
groupers <- enquos(...)
formula <- enexpr(formula)
df %>%
group_by(!!!groupers) %>%
do(model = lm(!!formula, data = .))
}
both of which do in fact allow me to get grouped models while varying a variable in the formula.
run_model1(mpg, "hwy", cyl)
#> Source: local data frame [4 x 2]
#> Groups: <by row>
#>
#> # A tibble: 4 x 2
#> cyl model
#> * <int> <list>
#> 1 4 <S3: lm>
#> 2 5 <S3: lm>
#> 3 6 <S3: lm>
#> 4 8 <S3: lm>
run_model2(mpg, hwy ~ cty, cyl)
#> Source: local data frame [4 x 2]
#> Groups: <by row>
#>
#> # A tibble: 4 x 2
#> cyl model
#> * <int> <list>
#> 1 4 <S3: lm>
#> 2 5 <S3: lm>
#> 3 6 <S3: lm>
#> 4 8 <S3: lm>
However, the first requires an awkward mixing of quoted and unquoted arguments and especially does not work well if I want to access the symbol version for use later. The second forces me to supply the entire formula every time, when I'd rather only supply one part.
Basically, how can I get a function that would take arguments like this?
run_model3(mpg, hwy, cyl)
ensym()
should let you capture a symbol provided to the function.
ensym()
andensyms()
are variants ofenexpr()
andenexprs()
that check the captured expression is either a string (which they convert to symbol) or a symbol. If anything else is supplied they throw an error.
run_model3 <- function (df, dep_str, ...) {
dep_str <- ensym(dep_str)
groupers <- enquos(...)
formula <- dep_str %>% str_c("~ cty") %>% as.formula()
df %>%
group_by(!!!groupers) %>%
do(model = lm(formula, data = .))
}
> run_model3(mpg, hwy, cyl)
Source: local data frame [4 x 2]
Groups: <by row>
# A tibble: 4 x 2
cyl model
* <int> <list>
1 4 <S3: lm>
2 5 <S3: lm>
3 6 <S3: lm>
4 8 <S3: lm>
And based off the quote before we can even use the current method of run_model1
:
> run_model3(mpg, "hwy", cyl)
Source: local data frame [4 x 2]
Groups: <by row>
# A tibble: 4 x 2
cyl model
* <int> <list>
1 4 <S3: lm>
2 5 <S3: lm>
3 6 <S3: lm>
4 8 <S3: lm>