This question is highly related to R - how to pass formula to a with(df, glm(y ~ x)) construction inside a function but asks a broader question.
Why do these expressions work?
text_obj <- "mpg ~ cyl"
form_obj <- as.formula(text_obj)
with(mtcars, lm(mpg ~ cyl))
with(mtcars, lm(as.formula(text_obj)))
lm(form_obj, data = mtcars)
But not this one?
with(mtcars, lm(form_obj))
Error in eval(predvars, data, env) : object 'mpg' not found
I would usually use the data
argument but this is not possible in the mice
package.
Ie.
library(mice)
mtcars[5, 5] <- NA # introduce a missing value to be imputed
mtcars.imp = mice(mtcars, m = 5)
These don't work
lm(form_obj, data = mtcars.imp)
with(mtcars.imp, lm(form.obj))
but this does
with(mtcars.imp, lm(as.formula(text_obj)))
Thus, is it better to always thus use the as.formula
argument inside the function, rather than construct it first and then pass it in?
An important "hidden" aspect of formulas is their associated environment.
When form_obj
is created, its environment is set to where form_obj
was created:
environment(form_obj)
# <environment: R_GlobalEnv>
For every other version, the formula's environment is created from within with()
, and is set to that temporary environment. It's easiest to see this with the as.formula
approach by splitting it into a few steps:
with(mtcars, {
f = as.formula(text_obj)
print(environment(f))
lm(f)
})
# <environment: 0x7fbb68b08588>
We can make the form_obj
approach work by editing its environment before calling lm
:
with(mtcars, {
# set form_obj's environment to the current one
environment(form_obj) = environment()
lm(form_obj)
})
The help page for ?formula
is a bit long, but there's a section on environments:
Environments
A formula object has an associated environment, and this environment (rather than the parent environment) is used by
model.frame
to evaluate variables that are not found in the supplied data argument.Formulas created with the
~
operator use the environment in which they were created. Formulas created withas.formula
will use theenv
argument for their environment.
The upshot is, making a formula with ~
puts the environment part "under the rug" -- in more general settings, it's safer to use as.formula
which gives you fuller control over the environment to which the formula applies.
You might also check Hadley's chapter on environments: