I fitted a model to a training dataset (DT1
) and would like to make prediction, based on this same model and using the exact same parameters on a new dataset (DT2
).
This is what I have tried:
library(data.table)
set.seed(1)
n <- 10
# Sample 1
DT1 <- data.table(x = rnorm(n), y = rnorm(n))
mdl <- DT1[, lm(y ~ poly(x, 3))]
# Sample 2
DT2 <- data.table(x = rnorm(n))
DT2[, yhat_a := cbind(1, x, x**2, x**3) %*% coef(mdl)]
DT2[, yhat_b := predict.lm(mdl)]
DT2[, yhat_c := predict.lm(mdl, type = "terms")]
DT2[, yhat_d := predict.lm(mdl, DT2)]
DT2[, yhat_e := predict.lm(mdl, DT2, type = "terms")]
The expected prediction should correspond to yhat_a
, but as you can see none of the predict.lm()
functions produces the expected prediction.
> print(DT2)
x yhat_a yhat_b yhat_c yhat_d yhat_e
1: 0.91897737 -2.9089955 0.39129117 0.14244620 -0.02386652 -0.27271149
2: 0.78213630 -2.1789312 0.93415007 0.68530510 0.25958313 0.01073816
3: 0.07456498 0.1452663 -0.01907297 -0.26791794 0.95832991 0.70948493
4: -1.98935170 -6.8834694 -2.13507075 -2.38391572 -4.25695978 -4.50580475
5: 0.61982575 -1.4310139 0.85431936 0.60547439 0.53332504 0.28448007
6: -0.05612874 0.3090645 0.01438047 -0.23446450 0.94734161 0.69849664
7: -0.15579551 0.3784421 0.70651054 0.45766557 0.90972644 0.66088147
8: -1.47075238 -3.1916187 0.34014661 0.09130164 -1.93990943 -2.18875440
9: -0.47815006 0.2741404 0.59593158 0.34708661 0.61523830 0.36639333
10: 0.41794156 -0.6792907 0.80586363 0.55701866 0.77941821 0.53057324
What am I missing?
Check out this answer for more info. The defaults for poly()
include raw = FALSE
, where columns are scaled to be orthogonal.
If you set raw = TRUE
, then your manually calculated yhat_a
will equal your yhat_d
.