Search code examples
rtime-seriesbayesian

Two methods of recovering fitted values from a Bayesian Structural Time Series model yield different results


Two conceptually plausible methods of retrieving in-sample predictions (or "conditional expectations") of y[t] given y[t-1] from a bsts model yield different results, and I don't understand why.

One method uses the prediction errors returned by bsts (defined as e=y[t] - E(y[t]|y[t-1]); source: https://rdrr.io/cran/bsts/man/one.step.prediction.errors.html):

library(bsts)
get_yhats1 <- function(fit){
  # One step prediction errors defined as e=y[t] - yhat (source: )
  # Recover yhat by y-e
  
  bsts.pred.errors <- bsts.prediction.errors(fit, burn=SuggestBurn(0.1, fit))$in.sample
  predictions <- t(apply(bsts.pred.errors, 1, function(e){fit$original.series-e}))
  return(predictions)
}

Another sums the contributions of all model component at time t.

get_yhats2 <- function(fit){
  burn <- SuggestBurn(0.1, fit)
  X     <- fit$state.contributions
  niter <- dim(X)[1]
  ncomp <- dim(X)[2]
  nobs  <- dim(X)[3]
  
  # initialize final fit/residuals matrices with zeros
  predictions <- matrix(data = 0, nrow = niter - burn, ncol = nobs)
  p0 <- predictions
  
  comps <- seq_len(ncomp)
  for (comp in comps) {
    # pull out the state contributions for this component and transpose to
    # a niter x (nobs - burn) array
    compX <- X[-seq_len(burn), comp, ]
    
    # accumulate the predictions across each component
    predictions <- predictions + compX
  }
  
  return(predictions)
}

Fit a model:

## Air passengers data
data("AirPassengers")
# 11 years, monthly data (timestep=monthly) --> 132 observations
Y <- stats::window(AirPassengers, start=c(1949,1), end=c(1959,12))
y <- log(Y)

ss <- AddLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons=12, season.duration=1)
bsts.model <- bsts(y, state.specification=ss, niter=500, family='gaussian')

Compute and compare predictions using each of the functions

p1 <- get_yhats1(bsts.model)
p2 <- get_yhats2(bsts.model)

# Compare predictions for t=1:5, first MCMC iteration:
p1[1,1:5]; p2[1,1:5]

Solution

  • I'm the author of bsts.

    The 'prediction errors' in bsts come from the filtering distribution. That is, they come from p(state | past data). The state contributions come from the smoothing distribution, i.e. p(state | all data). The filtering distribution looks backward in time, while the smoothing distribution looks both forward and backward. One typically needs the filtering distribution while using a fitted model, and the smoothing distribution while fitting the model in the first place.