I need to train a regression model able to estimate the full predictive distribution instead of a point forecast estimates.
Is there any package for R to do this? I usually use the tidymodels framework.
The R function predict
does exactly this for a linear model. Note that a linear model assumes a normally distributed response, but as the variance is estimated form the data, the prediction estimated from the data follows a t-distribution. You thus need, for given predictor values x
, estimators for the mean value μ̂(x)
, the standard deviation s(x)
and the degrees of freedom df
, all of which are computed by predict.lm
:
# train model on data
fit <- lm(waiting ~ eruptions, faithful)
# prediction for new predictor value 10
p <- predict(fit, data.frame(eruptions=10), se.fit=TRUE)
mu <- p$fit
# beware that s^2 is a combination of error of mu and of the residual sd
s <- sqrt(p$se.fit^2 + p$residual.scale^2)
df <- p$df
Then the probability density value of the response (prediction) at point r
is
dt((r-mu) / s, df=df)