I want to do hierarchical forecasting as described in Hyndman Forecasting's book in chapter 10: https://otexts.com/fpp2/
My problem is that for generating this type of forecasting (specifically the bottom-up approach) I need to develop a hts object that is a matrix. For example:
If I have a data frame like this: Image of an example of data frame prior to hts object
I need to convert it to a matrix like this: Image of Matrix that I need For this matrix, every row is a unit of time (it could be days, months, etc.).
My problem is that my data frame looks like this: Image of Problem with dataframe
One column is the date and the other are the categories from which I need to forecast the sales. The problem is this: for supermarket=4, id_product=187, and id_label=a the system registers movements on days 21 and 23 but nothing happens on day 22, which means that I need to have sales=0 on that day or in other words a row like this: Image of Row missing
How can I generate the matrix needed to create the hts object? Do I need to create the missing rows with 0? (I have thousands of missing rows, so it would be a nightmare to do it by hand)
Here is a reproducible example:
date=c("2019-03-22","2019-03-23","2019-04-24","2019-03-25")
id_supermarket=c(4,4,2,2)
id_product=c(187,187,189,190)
id_label=c("a","a","c","d")
sales=c(21,22,23,24)
df=as.data.frame(cbind(date,id_supermarket,id_product,id_label,sales))
Thanks in advance.
I recommend you use the fable
package instead of hts
. It is more recent and much easier to use. Here is an example with your data.
library(tsibble) library(fable)
# Create tsibble
df <- tibble(
date = lubridate::ymd(c("2019-03-22", "2019-03-23", "2019-03-24", "2019-03-25")),
id_supermarket = as.character(c(4, 4, 2, 2)),
id_product = c(187, 187, 189, 190),
id_label = c("a", "a", "c", "d"),
sales = c(21, 22, 23, 24)
) %>%
as_tsibble(index = date, key = c(id_supermarket, id_product, id_label)) %>%
fill_gaps(.full = TRUE)
# Forecast with reconciliation
fc <- df %>%
aggregate_key(id_supermarket * id_label, sales = sum(sales, na.rm = TRUE)) %>%
model(
arima = ARIMA(sales)
) %>%
reconcile(
arima = min_trace(arima)
) %>%
forecast(h = "5 days")
fc
#> # A fable: 45 x 6 [1D]
#> # Key: id_supermarket, id_label, .model [9]
#> id_supermarket id_label .model date sales .distribution
#> <chr> <chr> <chr> <date> <dbl> <dist>
#> 1 2 c arima 2019-03-26 5.82 N(5.8, 44)
#> 2 2 c arima 2019-03-27 5.82 N(5.8, 44)
#> 3 2 c arima 2019-03-28 5.82 N(5.8, 44)
#> 4 2 c arima 2019-03-29 5.82 N(5.8, 44)
#> 5 2 c arima 2019-03-30 5.82 N(5.8, 44)
#> 6 2 d arima 2019-03-26 6.34 N(6.3, 46)
#> 7 2 d arima 2019-03-27 6.34 N(6.3, 46)
#> 8 2 d arima 2019-03-28 6.34 N(6.3, 46)
#> 9 2 d arima 2019-03-29 6.34 N(6.3, 46)
#> 10 2 d arima 2019-03-30 6.34 N(6.3, 46)
#> # … with 35 more rows
Created on 2020-02-01 by the reprex package (v0.3.0)