Search code examples
rtime-seriesforecastinghierarchicalfable-r

Hierarchical Forecasting using R


I am using fable package to forecast for hierarchical time series and depth of all nodes is not equal. Use case is, forecasting contacts at country -> state -> district level. Forecast values have to add up to country level when aggregated (lower-level forecasts equate to the upperlevel forecasts.)
https://robjhyndman.com/papers/Foresight-hts-final.pdf
Given below is the code i tried and while forecasting on test data.

library(fable)
library(tsibble)
library(tsibbledata)
library(lubridate)
library(dplyr)

# selecting train data
train_df <- tourism %>%
  filter(year(Quarter) <= 2014 & Region %in% c("MacDonnell", "Melbourne"))

# selecting test data
test_df <- tourism %>%
  filter(year(Quarter) > 2014 & Region %in% c("MacDonnell", "Melbourne"))

# fitting ets model with reconcilliation
ets_fit <- train_df %>%
  aggregate_key(Purpose * (State / Region), Trips = sum(Trips)) %>%
  model(ets=ETS(Trips)) %>%
  reconcile(ets_adjusted = min_trace(ets))
# forecasting on test data
fcasts_test <- forecast(ets_fit, test_df)

Getting error as

Error: Provided data contains a different key structure to the models.
Run `rlang::last_error()` to see where the error occurred.

How can I solve this?


Solution

  • You changed the key structure using aggregate_key() before fitting the model, so the forecast key structure does not match the test set. You will need to create the test set after using aggregate_key().

    However, you cannot filter by one of the keys after creating aggregates, because then the aggregate information is incomplete.

    Here is an example that does what you want.

    library(fable)
    library(tsibble)
    library(tsibbledata)
    library(lubridate)
    library(dplyr)
    
    # Aggregate data as required
    agg_tourism <- tourism %>%
      filter(Region %in% c("MacDonnell", "Melbourne")) %>%
      aggregate_key(Purpose * (State / Region), Trips = sum(Trips))
    
    # Select training data
    train_df <- agg_tourism %>%
      filter(year(Quarter) <= 2014)
    
    # Select test data
    test_df <- agg_tourism %>%
      filter(year(Quarter) > 2014)
    
    # Fit ets model with reconcilliation
    ets_fit <- train_df %>%
      model(ets = ETS(Trips)) %>%
      reconcile(ets_adjusted = min_trace(ets))
    # forecasting on test data
    fcasts_test <- forecast(ets_fit, test_df)
    
    fcasts_test
    #> # A fable: 600 x 7 [1Q]
    #> # Key:     Purpose, State, Region, .model [50]
    #>    Purpose  State              Region     .model Quarter      Trips .mean
    #>    <chr*>   <chr*>             <chr*>     <chr>    <qtr>     <dist> <dbl>
    #>  1 Business Northern Territory MacDonnell ets    2015 Q1 N(5.1, 21)  5.12
    #>  2 Business Northern Territory MacDonnell ets    2015 Q2 N(5.1, 21)  5.12
    #>  3 Business Northern Territory MacDonnell ets    2015 Q3 N(5.1, 21)  5.12
    #>  4 Business Northern Territory MacDonnell ets    2015 Q4 N(5.1, 21)  5.12
    #>  5 Business Northern Territory MacDonnell ets    2016 Q1 N(5.1, 21)  5.12
    #>  6 Business Northern Territory MacDonnell ets    2016 Q2 N(5.1, 21)  5.12
    #>  7 Business Northern Territory MacDonnell ets    2016 Q3 N(5.1, 21)  5.12
    #>  8 Business Northern Territory MacDonnell ets    2016 Q4 N(5.1, 21)  5.12
    #>  9 Business Northern Territory MacDonnell ets    2017 Q1 N(5.1, 21)  5.12
    #> 10 Business Northern Territory MacDonnell ets    2017 Q2 N(5.1, 21)  5.12
    #> # … with 590 more rows
    
    fcasts_test %>%
      filter(Region == "Melbourne", Purpose == "Visiting") %>%
      autoplot(agg_tourism)
    

    Created on 2020-12-26 by the reprex package (v0.3.0)