Search code examples
rregressionlinear-regressionlmmgcv

R Recover original data.frame from model.frame


In R, you can fit GAM models from the mgcv package using a formula which contains transformations such as log or sqrt and by default the model.frame is returned (only the variables specified in the formula with transformations applied).

Is there any way I can recover the untransformed data.frame?

Example:

reg <- mgcv::gam(log(mpg) ~ disp + I(hp^2), data=mtcars)

returns

> head(reg$model,3) log(mpg) disp I(hp^2) Mazda RX4 3.044522 160 12100 Mazda RX4 Wag 3.044522 160 12100 Datsun 710 3.126761 108 8649

But, I want to get this untransformed dataset from the model's model.frame

mpg disp hp Mazda RX4 21.0 160 110 Mazda RX4 Wag 21.0 160 110 Datsun 710 22.8 108 93

Some Background: The newdata argument for most model's predict() function requires untransformed data, so I cannot feed the model.frame back into the predict() function. I am already aware that the omitting the newdata argument will return fitted values. My requirement is that the model object gives me back the raw data.


Solution

  • Here is one way: use glm instead of lm, even for Gaussian data. glm returns much more stuff than lm, including the raw data frame.


    Well, if you are asking mgcv questions, you'd better provide a mgcv example.

    mgcv has a consistent standard with glm. Have a read on ?gamObject for a full list of what gam can return. You will see that it can return data, if you set keepData via control argument of gam. When you call gam, add the following

    control = gam.control(keepData = TRUE)
    

    Here is a simple, reproducible example:

    dat <- data.frame(x = runif(50), y = rnorm(50))
    library(mgcv)
    fit <- gam(y ~ s(x, bs = 'cr', k = 5), data = dat, control = gam.control(keepData = TRUE))
    head(fit$model)  # model frame
    head(fit$data)  # original data