Search code examples
rscopegammgcv

Pass offset argument in mgcv::gam() wrapper


I wrote a wrapper function around mgcv::gam() to directly write the model to disk and to do some extra convenient things. So far, so good, every argument is passed on and it works. Except, when I add an offset argument to be passed on to mgcv::gam(offset = ). Below some example code.

require(mgcv)

gam1 = function(form,
                data,
                family,
                knots = NULL) {
  gam(formula = form,
      knots = knots,
      data = data,
      family = family)
}

gam1(Sepal.Length ~ s(Sepal.Width),
     data = iris,
     family = 'gaussian')

Arguments are passed to gam() (E.g. knots, family). Works. However, if I add offset_ to the party, it's not passed on but throws an error:

gam2 = function(form,
                data,
                family,
                knots = NULL,
                offset_ = NULL) {
  gam(formula = form,
      data = data,
      family = family,
      offset = offset_)
}

gam2(Sepal.Length ~ s(Sepal.Width),
     data = iris,
     family = 'gaussian',
     offset_ = NULL)

offset_ is not passed on and throws this error: Error in eval(extras, data, env) : object 'offset_' not found. If I were to use offset instead, it would throw this error: invalid type (closure) for variable '(offset)'.

Q: Why does my wrapper fail to pass on offset_? How can I make it run?

The error happens at the last line of the source code snippet below. However I don't quite understand why.

mf$drop.unused.levels <- drop.unused.levels
mf[[1]] <- quote(stats::model.frame) ## as.name("model.frame")
pmf <- mf
mf <- eval(mf, parent.frame()) # the model frame now contains all the data 

Solution

  • The problem is scoping. gam() looks for the variables in formula and offset first in the data argument, then in the environment that's attached to formula. Normally that would be the environment where formula was created; in your example that would be the global environment.

    You should be able to get things to work by adding the offset_ variable to the local copy of data, for example

    gam3 <- function(form,
                    data,
                    family,
                    knots = NULL,
                    offset_ = NULL) {
      if (is.null(offset_)) {
        gam(formula = form,
          data = data,
          family = family,
          offset = NULL)
      } else {
        data$offset_copy <- offset_
        gam(formula = form,
            data = data,
            family = family,
            offset = offset_copy)
      }
    }
    

    If data has a column named offset_copy, this will overwrite it, so be sure to use a name that won't already be in data.

    Edited to add: @GavinSimpson suggested modifying the formula in his answer, to avoid problems with predict(). I'd suggest a different modification than he used: instead of deparsing the formula and offset, just modify the formula directly. For example,

    fun <- function(form, data, offset_ = NULL, ...) {
        ## capture what was passed to offset_, unevaluated
        off <- substitute(offset_)
        if (!is.null(offset_)) { # need to add offset
            form[[3]] <- call("+", form[[3]], call("offset", off))
        }
        ## fit and return model
        gam(form, data = data, ...)
    }
    

    The function call() creates a call, so the form[[3]] line replaces the RHS of the formula with an unevaluated call to "+" of the old RHS, and a call to offset() with the offset included.

    The advantage of doing it this way instead of deparsing is that it should handle unusual cases properly, e.g. extremely long formulas or offsets that might deparse to several lines, or formulas where the environment is important, because this version leaves the environment unchanged.