Search code examples
rforecastingarima

Out of Sample forecast with auto.arima() and xreg


I'm working on a forecasting model, where I have monthly data from 2014 to current month (March 2018).

Part of my data are a column for billings and a column for quote amounts, e.g. (Apologies for the formatting)

Year - Quarter - Month - BILLINGS - QUOTES
2014- 2014Q1-- 201401- 100-------------500
2014- 2014Q1-- 201402- 150-------------600
2014- 2014Q1-- 201403- 200-------------700

I'm using this to predict monthly sales, and attempting to use xreg with the number of quotes monthly.

I reviewed the article below, but am missing something to accomplish what I'm trying to do: ARIMA forecasting with auto.Arima() and xreg

Question: Can somebody show an example of forecasting OUT OF SAMPLE using xreg? I understand that in order to accomplish this, you need to forecast your xreg variables out of sample, but I cannot figure out how to pass those future values in.

I tried using something like futurevalues$mean after predicting the values, but this did not work.

Here is my code:

sales = read.csv('sales.csv')

# Below, I'm creating a training set for the models through 
#  December 2017 (48 months).
train = sales[sales$TRX_MON<=201712,]

# I will also create a test set for our data from January 2018 (3 months)
test = sales[sales$TRX_MON>201712,]

dtstr2 <- ts(train2, start=2014, frequency=12)
dtste2 <- ts(test2, start=2018, frequency=12)

fit2 <- auto.arima(dtstr2[,"BILLINGS"], xreg=dtstr2[,"QUOTES"])
fcast2 <- forecast(fit2, xreg=dtste2[,"QUOTES"], h=24)
fcast2

The code above works, but only gives mea 3 month forecast, e.g.

                  Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
Jan 2018          70                60       100      50       130
Feb 2018          80                70       110      60       140
Mar 2018          90                80       120      70       150

I have scoured as many blogs and topics I could find seeking an example of using auto.arima with an out of sample forecast of an xreg variable, and cannot find any that have done this.

Can anybody help?

Thank you much.


Solution

  • Here is an MWE for out of sample prediction of time series with unknown covariates. This relies on the data provided for this question as well as @Raad 's excellent answer.

    library("forecast")
    
    dta = read.csv("~/stackexchange/data/xdata.csv")[1:96,]
    dta <- ts(dta, start = 1)
    
    # to illustrate out of sample forecasting with covariates lets split the data
    train <- window(dta, end = 90)
    test <- window(dta, start = 91)
    
    # fit model
    covariates <- c("Customers", "Open", "Promo")
    fit <- auto.arima(train[,"Sales"], xreg = train[, covariates])
    

    forecast from test data

    fcast <- forecast(fit, xreg = test[, covariates])
    

    But what if we do not know the values of Customers yet? The desired goal is to forecast Customers and then use those forecast values in the forecast of Sales. Open and Promo are under the control of the manager, so will be "fixed" in the forecast.

    customerfit <- auto.arima(train[,"Customers"], xreg = train[, c("Open","Promo")])
    

    I will attempt to forecast 2 weeks out, and assume there is no promotion.

    newdata <- data.frame(Open = rep(c(1,1,1,1,1,1,0), times = 2),
                              Promo = 0)
    
    customer_fcast <- forecast(customerfit, xreg = newdata)
    
    # the values of customer are in `customer_fcast$mean`
    
    newdata$Customers <- as.vector(customer_fcast$mean)
    

    It is critical to get newdata columns in same order as original data! forecast() matches regressors by position

    sales_fcast <- forecast(fit, xreg = as.matrix(newdata)[,c(3,1,2)])
    plot(sales_fcast)
    

    Created on 2018-03-29 by the reprex package (v0.2.0).