Search code examples
rregressionpredictionforecastingtidymodels

Is there an R package to train regression models that return probabilistic forecasts instead of point forecasts?


I need to train a regression model able to estimate the full predictive distribution instead of a point forecast estimates.

Is there any package for R to do this? I usually use the tidymodels framework.


Solution

  • The R function predict does exactly this for a linear model. Note that a linear model assumes a normally distributed response, but as the variance is estimated form the data, the prediction estimated from the data follows a t-distribution. You thus need, for given predictor values x, estimators for the mean value μ̂(x), the standard deviation s(x) and the degrees of freedom df, all of which are computed by predict.lm:

    # train model on data
    fit <- lm(waiting ~ eruptions, faithful)
    # prediction for new predictor value 10 
    p <- predict(fit, data.frame(eruptions=10), se.fit=TRUE)
    mu <- p$fit
    # beware that s^2 is a combination of error of mu and of the residual sd
    s <- sqrt(p$se.fit^2 + p$residual.scale^2)
    df <- p$df
    

    Then the probability density value of the response (prediction) at point r is

    dt((r-mu) / s, df=df)