I am making a function which takes unevaluated calls to regression functions as input, creates some data, and then evaluates the call. Here is an example:
library(lme4)
compute_fit <- function(m){
# Generate some data
df <- data.frame(x = rnorm(100), y = rnorm(100) + x, ID = sample(4, 100, replace = TRUE))
# Evaluate the call
eval(m, envir = df)
}
# Create a list of models
models <- list(
lm = call("lm", quote(list(formula = y ~ x))),
glm = call("glm", quote(list(formula = y ~ x))),
lmer = call("lmer", quote(list(formula = y ~ x + (1 | ID))))
)
# Evaluate the call (this works fine)
model_fits <- lapply(models, compute_fit)
My reason for doing this, is that I am conducting a simulation study, where I fit many different models on many Monte Carlo samples. The function is part of an internal package, and I want to provide the list of models, which are then evaluated inside the package.
I would like to also use the gam
function from mgcv
. In the documentation to gam, the following is said about its data
argument, which is practically equivalent to, e.g., the documentation to lm
:
A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which gam is called.
I hence try to use the same logic to compute a gam
, thinking that eval(m, envir = df)
in the compute_fit
function defined above should evaluate the formula in the environment of df
:
# Try with gam
library(mgcv)
gamcall = call("gam", quote(list(formula = y ~ x)))
compute_fit(gamcall)
However, this fails with the error message:
Error in eval(predvars, data, env): object 'y' not found
I realize that this error is likely related to this question, however my question is whether anyone can think of a workaround which lets me use gam
in the same way as I use the other modeling functions? To the best of my understanding, the linked question does not provide a solution to this question.
Here is a complete reprex:
set.seed(1)
library(lme4)
#> Loading required package: Matrix
compute_fit <- function(m){
# Generate some data
df <- data.frame(x = rnorm(100), ID = rep(1:50, 2))
df$y <- df$x + rnorm(100, sd = .1)
# Evaluate the call
eval(m, envir = df)
}
# Create a list of models
models <- list(
lm = call("lm", quote(list(formula = y ~ x))),
glm = call("glm", quote(list(formula = y ~ x))),
lmer = call("lmer", quote(list(formula = y ~ x + (1 | ID))))
)
# Evaluate the call (this works fine)
model_fits <- lapply(models, compute_fit)
# Try with gam
library(mgcv)
#> Loading required package: nlme
#>
#> Attaching package: 'nlme'
#> The following object is masked from 'package:lme4':
#>
#> lmList
#> This is mgcv 1.8-26. For overview type 'help("mgcv-package")'.
gamcall = call("gam", quote(list(formula = y ~ x)))
compute_fit(gamcall)
#> Error in eval(predvars, data, env): object 'y' not found
I would add df
to the call instead of evaluating within df
:
compute_fit <- function(m){
# Generate some data
set.seed(1)
df <- data.frame(x <- rnorm(100), y = rnorm(100) + x^3, ID = sample(4, 100, replace = TRUE))
#add data parameter to call
m[["data"]] <- quote(df)
# Evaluate the call
eval(m)
}
# Create a list of models
models <- list(
lm = quote(lm(formula = y ~ x)),
glm = quote(glm(formula = y ~ x)),
lmer = quote(lmer(formula = y ~ x + (1 | ID))),
gam = quote(gam(formula = y ~ s(x)))
)
model_fits <- lapply(models, compute_fit)
#works but lmer reports singular fit