Search code examples
rtime-seriesforecastingarima

How to forecast an arima with Dynamic regression models for grouped data?


I'm trying to make a forecast of a arima with regression (Regression with ARIMA errors) to several ts at the same time and using grouped data.

I'm new in the tidy data so... Basically, I'm reproducing this example (https://cran.rstudio.com/web/packages/sweep/vignettes/SW01_Forecasting_Time_Series_Groups.html) with a multivariate ts, and multivariate model.

here is a reproducible example:

library(tidyverse); library(tidyquant)
library(timetk); library(sweep)
library(forecast)
library(tsibble)
library(fpp3)

# using package data
bike_sales

# grouping data 
monthly_qty_by_cat2 <- bike_sales %>%
  mutate(order.month = as_date(as.yearmon(order.date))) %>%
  group_by(category.secondary, order.month) %>%
  summarise(total.qty = sum(quantity), price.m = mean(price))

# using nest 
monthly_qty_by_cat2_nest <- monthly_qty_by_cat2 %>%
  group_by(category.secondary) %>%
  nest()
monthly_qty_by_cat2_nest

# Forecasting Workflow
# Step 1: Coerce to a ts object class
monthly_qty_by_cat2_ts <- monthly_qty_by_cat2_nest %>%
  mutate(data.ts = map(.x       = data, 
                       .f       = tk_ts, 
                       select   = -order.month,  # take off date 
                       start    = 2011, 
                       freq     = 12))


# Step 2: modeling an ARIMA(y ~ x)
# make a function to map
modeloARIMA_reg <- function(y,x) {
  result <- ARIMA(y ~ x)
  return(list(result))}

# map the function 
monthly_qty_by_cat2_fit <- monthly_qty_by_cat2_ts %>%
  mutate(fit.arima = map(data.ts, modeloARIMA_reg))
monthly_qty_by_cat2_fit

Here I dont know if the map is using the right variable in y (the dependent), but I keep going try the forecast and an error appears

# Step 3: Forecasting the model
monthly_qty_by_cat2_fcast <- monthly_qty_by_cat2_fit %>%
  mutate(fcast.ets = map(fit.arima, forecast))

# this give me this error
# Erro: Problem with `mutate()` input `fcast.arima`.
# x argumento não-numérico para operador binário
# i Input `fcast.arima` is `map(fit.arima, forecast)`.
# i The error occured in group 1: category.secondary = "Cross Country Race".
# Run `rlang::last_error()` to see where the error occurred.
# Além disso: Warning message:
#   In mean.default(x, na.rm = TRUE) :
#   argument is not numeric or logical: returning NA

Two questions emerge:

I dont know how to input the mean of the independent variable (x) of each group;

AND how to declare this new data as a forecast argument.

PS: Dont need be tibble or nested result, I just need the point forecast and the CI (total.qty lo.95 hi.95)


Solution

  • Well, this code solve the problem for me. This make one forecast for each time-series (grouped tsibble) and use the own mean value of those time-series as future data in the forecast Any comment is welcome.

    # MY FLOW
    monthly_qty_by_cat2 <- 
      sweep::bike_sales %>%
      mutate(order.month = yearmonth(order.date)) %>%
      group_by(category.secondary, order.month) %>%
      summarise(total.qty = sum(quantity), price.m = mean(price)) %>% 
      as_tsibble(index=order.month, key=category.secondary) # coerse in tsibble
    # mean for the future
    futuro <- monthly_qty_by_cat2 %>% 
      group_by(category.secondary) %>% 
      mutate(fut_x = mean(price.m)) %>% 
      do(price.m = head(.$fut_x,1))
    # as.numeric
    futuro$price.m <- as.numeric(futuro$price.m)
    futuro
    # make values in the future
    future_x <- new_data(monthly_qty_by_cat2, 3) %>%
      left_join(futuro, by = "category.secondary")
    future_x
    
    # model and forecast
    fc <- monthly_qty_by_cat2 %>% 
      group_by(category.secondary) %>% 
      model(ARIMA(total.qty ~ price.m))  %>%
      forecast(new_data=future_x)  %>% 
      hilo(level = 95) %>% 
      unpack_hilo("95%")
    fc
    
    # Tidy the forecast
    fc_tibble <- fc %>%  as_tibble() %>% select(-total.qty)
    fc_tibble
    # the end
    

    Well this solve the problem for me. This make one forecast for each group time-series and use the own mean value of those group time-series as future data in the forecast Any comment is welcome.