Search code examples
rgammgcv

Non-standard evaluation with mgcv::gam


I am making a function which takes unevaluated calls to regression functions as input, creates some data, and then evaluates the call. Here is an example:

library(lme4)
compute_fit <- function(m){
  # Generate some data
  df <- data.frame(x = rnorm(100), y = rnorm(100) + x, ID = sample(4, 100, replace = TRUE))
  # Evaluate the call
  eval(m, envir = df)
}

# Create a list of models
models <- list(
  lm = call("lm", quote(list(formula = y ~ x))),
  glm = call("glm", quote(list(formula = y ~ x))),
  lmer = call("lmer", quote(list(formula = y ~ x + (1 | ID))))
)

# Evaluate the call (this works fine)
model_fits <- lapply(models, compute_fit)

My reason for doing this, is that I am conducting a simulation study, where I fit many different models on many Monte Carlo samples. The function is part of an internal package, and I want to provide the list of models, which are then evaluated inside the package.

I would like to also use the gam function from mgcv. In the documentation to gam, the following is said about its data argument, which is practically equivalent to, e.g., the documentation to lm:

A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which gam is called.

I hence try to use the same logic to compute a gam, thinking that eval(m, envir = df) in the compute_fit function defined above should evaluate the formula in the environment of df:

# Try with gam
library(mgcv)
gamcall = call("gam", quote(list(formula = y ~ x)))    
compute_fit(gamcall)  

However, this fails with the error message:

Error in eval(predvars, data, env): object 'y' not found

I realize that this error is likely related to this question, however my question is whether anyone can think of a workaround which lets me use gam in the same way as I use the other modeling functions? To the best of my understanding, the linked question does not provide a solution to this question.

Here is a complete reprex:

set.seed(1)
library(lme4)
#> Loading required package: Matrix
compute_fit <- function(m){
  # Generate some data
  df <- data.frame(x = rnorm(100), ID = rep(1:50, 2))
  df$y <- df$x + rnorm(100, sd = .1)
  # Evaluate the call
  eval(m, envir = df)
}

# Create a list of models
models <- list(
  lm = call("lm", quote(list(formula = y ~ x))),
  glm = call("glm", quote(list(formula = y ~ x))),
  lmer = call("lmer", quote(list(formula = y ~ x + (1 | ID))))
)

# Evaluate the call (this works fine)
model_fits <- lapply(models, compute_fit)

# Try with gam
library(mgcv)
#> Loading required package: nlme
#> 
#> Attaching package: 'nlme'
#> The following object is masked from 'package:lme4':
#> 
#>     lmList
#> This is mgcv 1.8-26. For overview type 'help("mgcv-package")'.
gamcall = call("gam", quote(list(formula = y ~ x)))    
compute_fit(gamcall)    
#> Error in eval(predvars, data, env): object 'y' not found

Solution

  • I would add df to the call instead of evaluating within df:

    compute_fit <- function(m){
      # Generate some data
      set.seed(1)
      df <- data.frame(x <- rnorm(100), y = rnorm(100) + x^3, ID = sample(4, 100, replace = TRUE))
      #add data parameter to call
      m[["data"]] <- quote(df)
      # Evaluate the call
      eval(m)
    }
    
    # Create a list of models
    models <- list(
      lm = quote(lm(formula = y ~ x)),
      glm = quote(glm(formula = y ~ x)),
      lmer = quote(lmer(formula = y ~ x + (1 | ID))),
      gam = quote(gam(formula = y ~ s(x)))
    )
    
    model_fits <- lapply(models, compute_fit)
    #works but lmer reports singular fit