Search code examples
rforecastingforecast

Hierarchical Forecasting problem generating the hts object


I want to do hierarchical forecasting as described in Hyndman Forecasting's book in chapter 10: https://otexts.com/fpp2/

My problem is that for generating this type of forecasting (specifically the bottom-up approach) I need to develop a hts object that is a matrix. For example:

If I have a data frame like this: Image of an example of data frame prior to hts object

I need to convert it to a matrix like this: Image of Matrix that I need For this matrix, every row is a unit of time (it could be days, months, etc.).

My problem is that my data frame looks like this: Image of Problem with dataframe

One column is the date and the other are the categories from which I need to forecast the sales. The problem is this: for supermarket=4, id_product=187, and id_label=a the system registers movements on days 21 and 23 but nothing happens on day 22, which means that I need to have sales=0 on that day or in other words a row like this: Image of Row missing

How can I generate the matrix needed to create the hts object? Do I need to create the missing rows with 0? (I have thousands of missing rows, so it would be a nightmare to do it by hand)

Here is a reproducible example:

date=c("2019-03-22","2019-03-23","2019-04-24","2019-03-25")
id_supermarket=c(4,4,2,2)
id_product=c(187,187,189,190)
id_label=c("a","a","c","d")
sales=c(21,22,23,24)

df=as.data.frame(cbind(date,id_supermarket,id_product,id_label,sales))

Thanks in advance.


Solution

  • I recommend you use the fable package instead of hts. It is more recent and much easier to use. Here is an example with your data.

    library(tsibble) library(fable)

    # Create tsibble
    df <- tibble(
      date = lubridate::ymd(c("2019-03-22", "2019-03-23", "2019-03-24", "2019-03-25")),
      id_supermarket = as.character(c(4, 4, 2, 2)),
      id_product = c(187, 187, 189, 190),
      id_label = c("a", "a", "c", "d"),
      sales = c(21, 22, 23, 24)
    ) %>%
      as_tsibble(index = date, key = c(id_supermarket, id_product, id_label)) %>%
      fill_gaps(.full = TRUE)
    
    # Forecast with reconciliation
    fc <- df %>%
      aggregate_key(id_supermarket * id_label, sales = sum(sales, na.rm = TRUE)) %>%
      model(
        arima = ARIMA(sales)
      ) %>%
      reconcile(
        arima = min_trace(arima)
      ) %>%
      forecast(h = "5 days")
    
    fc
    #> # A fable: 45 x 6 [1D]
    #> # Key:     id_supermarket, id_label, .model [9]
    #>    id_supermarket id_label .model date       sales .distribution
    #>    <chr>          <chr>    <chr>  <date>     <dbl> <dist>       
    #>  1 2              c        arima  2019-03-26  5.82 N(5.8, 44)   
    #>  2 2              c        arima  2019-03-27  5.82 N(5.8, 44)   
    #>  3 2              c        arima  2019-03-28  5.82 N(5.8, 44)   
    #>  4 2              c        arima  2019-03-29  5.82 N(5.8, 44)   
    #>  5 2              c        arima  2019-03-30  5.82 N(5.8, 44)   
    #>  6 2              d        arima  2019-03-26  6.34 N(6.3, 46)   
    #>  7 2              d        arima  2019-03-27  6.34 N(6.3, 46)   
    #>  8 2              d        arima  2019-03-28  6.34 N(6.3, 46)   
    #>  9 2              d        arima  2019-03-29  6.34 N(6.3, 46)   
    #> 10 2              d        arima  2019-03-30  6.34 N(6.3, 46)   
    #> # … with 35 more rows
    

    Created on 2020-02-01 by the reprex package (v0.3.0)