In R, you can fit GAM models from the mgcv
package using a formula which contains transformations such as log
or sqrt
and by default the model.frame
is returned (only the variables specified in the formula with transformations applied).
Is there any way I can recover the untransformed data.frame
?
Example:
reg <- mgcv::gam(log(mpg) ~ disp + I(hp^2), data=mtcars)
returns
> head(reg$model,3)
log(mpg) disp I(hp^2)
Mazda RX4 3.044522 160 12100
Mazda RX4 Wag 3.044522 160 12100
Datsun 710 3.126761 108 8649
But, I want to get this untransformed dataset from the model's model.frame
mpg disp hp
Mazda RX4 21.0 160 110
Mazda RX4 Wag 21.0 160 110
Datsun 710 22.8 108 93
Some Background: The newdata
argument for most model's predict()
function requires untransformed data, so I cannot feed the model.frame
back into the predict()
function. I am already aware that the omitting the newdata
argument will return fitted values. My requirement is that the model object gives me back the raw data.
Here is one way: use glm
instead of lm
, even for Gaussian data. glm
returns much more stuff than lm
, including the raw data frame.
Well, if you are asking mgcv
questions, you'd better provide a mgcv
example.
mgcv
has a consistent standard with glm
. Have a read on ?gamObject
for a full list of what gam
can return. You will see that it can return data
, if you set keepData
via control
argument of gam
. When you call gam
, add the following
control = gam.control(keepData = TRUE)
Here is a simple, reproducible example:
dat <- data.frame(x = runif(50), y = rnorm(50))
library(mgcv)
fit <- gam(y ~ s(x, bs = 'cr', k = 5), data = dat, control = gam.control(keepData = TRUE))
head(fit$model) # model frame
head(fit$data) # original data