Search code examples
rforecastingfable-r

Does the forecast function within fable provide one-step forecasts?


As described here, making one-step forecasts in the test set is a way of avoiding the inevitable increase in variance as a forecast horizon increases. Mentioned in that section are methods to perform one-step forecasts on the test set using an already-trained model, for the forecast package. Is there a similar way of performing a one-step forecast for test data using the newer fable package? Perhaps the new_data parameter described here, for example handles this, but I am not sure, as the forecasts for both h = 24 and new_data = x_test are the same below:

> library(fable)
> library(fabletools)
> x <- USAccDeaths %>%
+   as_tsibble()
> x
# A tsibble: 72 x 2 [1M]
      index value
      <mth> <dbl>
 1 1973 Jan  9007
 2 1973 Feb  8106
 3 1973 Mar  8928
 4 1973 Apr  9137
 5 1973 May 10017
 6 1973 Jun 10826
 7 1973 Jul 11317
 8 1973 Aug 10744
 9 1973 Sep  9713
10 1973 Oct  9938
# … with 62 more rows
> x_train <- x %>% filter(year(index) < 1977)
> x_test <- x %>% filter(year(index) >= 1977)
> fit <- x_train %>% model(arima = ARIMA(log(value) ~ pdq(0, 1, 1) + PDQ(0, 1, 1)))
> fit
# A mable: 1 x 1
                      arima
                    <model>
1 <ARIMA(0,1,1)(0,1,1)[12]>
> nrow(x_test)
[1] 24
> forecast(fit, h = 24)$.mean
 [1]  7778.052  7268.527  7831.507  7916.845  8769.478  9144.790 10004.816  9326.874  8172.226
[10]  8527.355  8015.100  8378.166  7692.356  7191.343  7751.466  7839.085  8686.833  9062.247
[19]  9918.487  9250.101  8108.202  8463.933  7958.667  8322.497
> forecast(fit, new_data = x_test)$.mean
 [1]  7778.052  7268.527  7831.507  7916.845  8769.478  9144.790 10004.816  9326.874  8172.226
[10]  8527.355  8015.100  8378.166  7692.356  7191.343  7751.466  7839.085  8686.833  9062.247
[19]  9918.487  9250.101  8108.202  8463.933  7958.667  8322.497

Solution

  • Answer and code

    The model argument available for many models in the {forecast} package is equivalent to the refit() method in the {fable} package. When used with future data, it can be used to produce multiple one-step forecasts from a model.

    library(forecast)
    fit <- head(USAccDeaths, -24) %>% 
      auto.arima()
    fit_test <- tail(USAccDeaths, 24) %>% 
      Arima(model = fit)
    accuracy(fit_test)
    #>                    ME     RMSE      MAE       MPE      MAPE      MASE
    #> Training set 22.45098 167.0648 85.59724 0.2382773 0.9327587 0.3298545
    #>                    ACF1
    #> Training set -0.0968173
    
    library(fable)
    library(dplyr)
    us_accidental_deaths <- as_tsibble(USAccDeaths)
    fit <- head(us_accidental_deaths, -24) %>% 
      model(ARIMA(value))
    fit_test <- refit(fit, tail(us_accidental_deaths, 24), reestimate = FALSE)
    accuracy(fit_test)
    #> # A tibble: 1 x 10
    #>   .model       .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
    #>   <chr>        <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
    #> 1 ARIMA(value) Training  22.5  167.  85.6 0.238 0.933 0.330 0.490 -0.0968
    

    Created on 2020-10-13 by the reprex package (v0.3.0)

    Explanation

    The fitted() values of a model are one-step ahead forecasts, which can be used to evaluate 'training accuracy' performance (forecast accuracy on the training data). However there's a catch - the estimated parameters of the model are based on the entire training set and so the training accuracy is better than what can be expected (the model contains some information about the future it is fitting).

    The forecast() function is used to produce forecasts of future time points, of which the model has never seen. You can produce a single one-step ahead forecast by using forecast(<mable>, h = 1). However this only produces a single forecast. Instead, we want to produce a one-step ahead forecast, add one new observation to the model, and then produce another one-step ahead forecast beyond that new observation (repeating until running out of data).

    This is where the refit() function is useful. It takes an existing model, and applies it to a new dataset. This refitting process involves computing one-step forecasts on the data (the fitted() values). By setting reestimate = FALSE, the model's estimated coefficients will not be updated to better suit the new 'future' data. This resolves the issue of the model coefficients containing some information about the future values we are testing the forecast accuracy with.