Search code examples
rlapplyglmspeedglm

lapply, glm, and speedglm inside a function: argument "data" is missing, with no default


I am using mtcars data to show my problem. The following code works fine with glm. It generates new models by adding each variable in the vlist to the model of glm(vs ~ mpg, family = binomial(), data = mtcars.

check_glm <- function(crude, vlist, data, ...){
  a <- glm(crude, data = data, family = binomial())
  lapply(vlist, function(x) update(a, as.formula(paste0(". ~ . +", x))))
}
check_glm(crude = "vs ~ mpg", vlist = c("am", "hp"), data = mtcars)

However, when I replaced glm with speedglm,

library(speedglm)
check_speedglm <- function(crude, vlist, data, ...){
  a <- speedglm(crude, data = data, family = binomial())
  lapply(vlist, function(x) update(a, as.formula(paste0(". ~ . +", x))))
}
check_speedglm(crude = "vs ~ mpg", vlist = c("am", "hp"), data = mtcars)

I got:

Error in model.frame.default(formula = vs ~ mpg + am, data = data, drop.unused.levels = TRUE) : argument "data" is missing, with no default.

I think the problem is in the lapply line but I could not work out a solution. Any suggestions to fix this would be appreciated.


Solution

  • Essentially, you are mixing up package methods that may not be compatible with each other. Though they share same name, both of these methods are from different packages so different authors for different purposes and output different objects (glm class vs. speedglm class which may be S3 vs S4 objects).

    Specifically, the glm method is part of R's standard library in stats package, which works with its related stats method, update.

    Per update docs,

    update will update and (by default) re-fit a model. It does this by extracting the call stored in the object, updating the call and (by default) evaluating that call. 

    Main argument:

    object, x: An existing fit from a model function such as lmglm and many others

    Therefore, if speedglm stores the call to capture formula, data, and others args and resembles the return object structure as glm (which inherits from lm class), then update would work.


    To resolve, consider doing what update does by dynamically building formula with iterative model calls using lapply. This would work in both methods, since each uses the formula object.

    library(speedglm) 
    
    check_speedglm <- function(crude, vlist, data, ...){ 
       lapply(seq_along(vlist), function(i)
           speedglm(as.formula(paste(crude, "+", paste(vlist[1:i], collapse=" + "))), 
                    data = data, family = binomial()) 
       )
    } 
    
    check_speedglm(crude = "vs ~ mpg", vlist = c("am", "hp"), data = mtcars)