Search code examples
rprediction

R: predict values on new dataset


I fitted a model to a training dataset (DT1) and would like to make prediction, based on this same model and using the exact same parameters on a new dataset (DT2).

This is what I have tried:

library(data.table)
set.seed(1)
n <- 10

# Sample 1
DT1 <- data.table(x = rnorm(n), y = rnorm(n))
mdl <- DT1[, lm(y ~ poly(x, 3))]

# Sample 2
DT2 <- data.table(x = rnorm(n))
DT2[, yhat_a := cbind(1, x, x**2, x**3) %*% coef(mdl)]
DT2[, yhat_b := predict.lm(mdl)]
DT2[, yhat_c := predict.lm(mdl, type = "terms")]
DT2[, yhat_d := predict.lm(mdl, DT2)]
DT2[, yhat_e := predict.lm(mdl, DT2, type = "terms")]

The expected prediction should correspond to yhat_a, but as you can see none of the predict.lm() functions produces the expected prediction.

> print(DT2)
              x     yhat_a      yhat_b      yhat_c      yhat_d      yhat_e
 1:  0.91897737 -2.9089955  0.39129117  0.14244620 -0.02386652 -0.27271149
 2:  0.78213630 -2.1789312  0.93415007  0.68530510  0.25958313  0.01073816
 3:  0.07456498  0.1452663 -0.01907297 -0.26791794  0.95832991  0.70948493
 4: -1.98935170 -6.8834694 -2.13507075 -2.38391572 -4.25695978 -4.50580475
 5:  0.61982575 -1.4310139  0.85431936  0.60547439  0.53332504  0.28448007
 6: -0.05612874  0.3090645  0.01438047 -0.23446450  0.94734161  0.69849664
 7: -0.15579551  0.3784421  0.70651054  0.45766557  0.90972644  0.66088147
 8: -1.47075238 -3.1916187  0.34014661  0.09130164 -1.93990943 -2.18875440
 9: -0.47815006  0.2741404  0.59593158  0.34708661  0.61523830  0.36639333
10:  0.41794156 -0.6792907  0.80586363  0.55701866  0.77941821  0.53057324

What am I missing?


Solution

  • Check out this answer for more info. The defaults for poly() include raw = FALSE, where columns are scaled to be orthogonal.

    If you set raw = TRUE, then your manually calculated yhat_a will equal your yhat_d.