I wrote a wrapper function around mgcv::gam()
to directly write the model to disk and to do some extra convenient things. So far, so good, every argument is passed on and it works. Except, when I add an offset argument to be passed on to mgcv::gam(offset = )
. Below some example code.
require(mgcv)
gam1 = function(form,
data,
family,
knots = NULL) {
gam(formula = form,
knots = knots,
data = data,
family = family)
}
gam1(Sepal.Length ~ s(Sepal.Width),
data = iris,
family = 'gaussian')
Arguments are passed to gam()
(E.g. knots, family). Works. However, if I add offset_
to the party, it's not passed on but throws an error:
gam2 = function(form,
data,
family,
knots = NULL,
offset_ = NULL) {
gam(formula = form,
data = data,
family = family,
offset = offset_)
}
gam2(Sepal.Length ~ s(Sepal.Width),
data = iris,
family = 'gaussian',
offset_ = NULL)
offset_
is not passed on and throws this error:
Error in eval(extras, data, env) : object 'offset_' not found
. If I were to use offset
instead, it would throw this error: invalid type (closure) for variable '(offset)'
.
Q: Why does my wrapper fail to pass on offset_
? How can I make it run?
The error happens at the last line of the source code snippet below. However I don't quite understand why.
mf$drop.unused.levels <- drop.unused.levels
mf[[1]] <- quote(stats::model.frame) ## as.name("model.frame")
pmf <- mf
mf <- eval(mf, parent.frame()) # the model frame now contains all the data
The problem is scoping. gam()
looks for the variables in formula
and offset
first in the data
argument, then in the environment that's attached to formula
. Normally that would be the environment where formula
was created; in your example that would be the global environment.
You should be able to get things to work by adding the offset_
variable to the local copy of data
, for example
gam3 <- function(form,
data,
family,
knots = NULL,
offset_ = NULL) {
if (is.null(offset_)) {
gam(formula = form,
data = data,
family = family,
offset = NULL)
} else {
data$offset_copy <- offset_
gam(formula = form,
data = data,
family = family,
offset = offset_copy)
}
}
If data
has a column named offset_copy
, this will overwrite it, so be sure to use a name that won't already be in data
.
Edited to add: @GavinSimpson suggested modifying the formula in his answer, to avoid problems with predict()
. I'd suggest a different modification than he used: instead of deparsing the formula and offset, just modify the formula directly. For example,
fun <- function(form, data, offset_ = NULL, ...) {
## capture what was passed to offset_, unevaluated
off <- substitute(offset_)
if (!is.null(offset_)) { # need to add offset
form[[3]] <- call("+", form[[3]], call("offset", off))
}
## fit and return model
gam(form, data = data, ...)
}
The function call()
creates a call, so the form[[3]]
line replaces the RHS of the formula with an unevaluated call to "+"
of the old RHS, and a call to offset()
with the offset included.
The advantage of doing it this way instead of deparsing is that it should handle unusual cases properly, e.g. extremely long formulas or offsets that might deparse to several lines, or formulas where the environment is important, because this version leaves the environment unchanged.