Search code examples
rglmlm

update() a model inside a function with local covariate


I need to update a regression model from inside a function. Ideally, the function should work with any kind of models (lm, glm, multinom, clm). More precisely, I need to add one or several covariates that are defined inside the function. Here is an exemple.

MyUpdate <- function(model){
     randData <- data.frame(var1=rnorm(length(model$residuals)))
     model2 <- update(model, ".~.+randData$var1")
     return(model2)
}

Here is an example use

data(iris)
model1 <- lm(Sepal.Length~Species, data=iris)
model2 <- MyUpdate(model1)

Error in eval(expr, envir, enclos) : object 'randData' not found

Here is another example with glm

model1 <- glm(Sepal.Length>5~Species, data=iris, family=binomial)
model2 <- MyUpdate(model1)

Any idea?


Solution

  • The problem is that var1 is looked up in the data frame and the model's environment but not within the environment in MyUpdate.

    1) To avoid this problem update the model with not only the revised formula but also a revised data frame containing var1 :

    MyUpdate <- function(model) {
         mf <- model.frame(model)
         n <- nrow(mf)
         var1 <- rnorm(n)
         update(model, formula = . ~ . + var1, data = data.frame(mf, var1))
    }
    

    The above is probably the best solution of the ones presented in this answer as it avoids mucking around with internal structures. It seems to work for lm, glm, multinom and clm. The other solutions below do muck around with internal structures and therefore are less general across model fitting routines. The others all work with lm but may not work for others.

    test Here is a test which runs without errors on each of the model fitting functions mentioned in the question if MyUpdate is as above and also the solutions in (2) all run the tests without error. The solution (3) works at least with lm.

    model.lm <- lm(Sepal.Length~Species, data=iris)
    MyUpdate(model.lm)
    
    model.glm <- glm(Sepal.Length~Species, data=iris)
    MyUpdate(model.glm)
    
    library(nnet)
    example(multinom)
    MyUpdate(bwt.mu)
    
    library(ordinal)
    model.clm <- clm(rating ~ temp * contact, data = wine)
    MyUpdate(model.clm)
    

    The remaining solutions perform more direct access of internals making them less robust to changing the model function.

    2) Messing with Environments

    In addition here are three solutions that involve messing with environments. The first is the cleanest followed by the second and then the third. The third is the least acceptable since it actually writes var1 into the model's environment (dangerously overwriting any var1 there) but it is the shortest. They work with lm, glm multinom and clm.

    Note that we do not really need to put var1 into a data frame nor is it necessary to put the updating formula in quotes and we have changed both in all examples below. Also the return statement can be removed and we have done that too.

    2a) The following modifies the environment of the original model to point to a new proxy proto object containing var1 whose parent is the original model environment. Here proto(p, var1 = rnorm(n)) is the proxy proto object (a proto object is an environment with differing semantics) and p is the parent of the proxy.

    library(proto)
    
    MyUpdate <- function(model){
    
         mf <- model.frame(model)
         n <- nrow(mf)
         var1 <- rnorm(n)
         p <- environment(formula(model))
    
         if (is.null(model$formula)) {
               attr(model$terms, ".Environment") <- proto(p, var1 = var1)
         } else environment(model$formula) <- proto(p, var1 = var1)
    
         update(model, . ~ . + var1) 
    }
    
    #note: the period is shorthand for 
    keep everything on either the left or right hand side 
    of the formula (i.e., the ~) and 
    the + or - sign are used to add or remove model terms
    

    For more information read the Proxies section in this document: http://r-proto.googlecode.com/files/prototype_approaches.pdf

    2b) This could alternately be done without proto but at the expense of expanding the ## line to three lines containing some additional ugly environment manipulations. Here e is the proxy environment.

    MyUpdate <- function(model){
         mf <- model.frame(model)
         n <- nrow(mf)
         var1 <- rnorm(n)
         p <- environment(formula(model))
    
         e <- new.env(parent = p)
         e$var1 <- var1
    
         if (is.null(model$formula)) attr(model$terms, ".Environment") <- e
         else environment(model$formula) <- e
    
         update(model, . ~ . + var1)
    }
    

    2c) Shortest but the most hackish is to stick var1 into the original model environment:

    MyUpdate <- function(model){
         mf <- model.frame(model)
         n <- nrow(mf)
         var1 <- rnorm(n)       
    
         if (is.null(model$formula)) attr(model$terms, ".Environment")$var1 <- var1
         else environment(model$formula)$var1 <- var1
    
         update(model, . ~ . + var1)
    }
    

    3) eval/substitute This solution does use eval which is sometimes frowned upon. It works on lm and glm and on clm it works except that the output does not display var1 but rather the expression that computes it.

    MyUpdate <- function(model) {
         m <- eval.parent(substitute(update(model, . ~ . + rnorm(nrow(model.frame(model))))))
         m$call$formula <- update(formula(model), . ~ . + var1)
         names(m$coefficients)[length(m$coefficient)] <- "var1"
         m
    }
    

    REVISED Added additional solutions, simplified (1), got solutions in (2) to run all examples in test section.