Search code examples
rtime-seriesforecasting

Unable to pass xreg values to hts ARIMA forecast


I am trying to pass xreg arguments in my forecast but keep running into an error which says:

fc=forecast(gy,fmethod="arima",h=days,method="bu",xreg=z,newxreg=fz)

Error in as.matrix(newxreg) %*% coefs : non-conformable arguments
In addition: Warning message:
In cbind(intercept = rep(1, n), xreg) :
  number of rows of result is not a multiple of vector length (arg 1)

I don't understand the error message. Since I'm unable to post the original data, I created a mock set of data in the code below. For some reason, the mock data works fine. When I comment out the temp lines and use my csv file instead, I get the above error.

library(hts)

#Get data from file
#data=read.csv("C:/mydatafile.csv")

#TEMP: Create mock data
temp.dates=rep(seq(1,60),times=5)
temp.pl2=c(paste("A",rep(seq(1,3),each=60),sep=""),rep("A4",120))
temp.pl3=paste("B",rep(seq(1,5),each=60),sep="")

data=data.frame(DateId=temp.dates,
                ProductLevel2Code=temp.pl2,
                ProductLevel3Code=temp.pl3,
                SalesNetAmount=rnorm(300,mean=15000,sd=2000),
                TotalViews=rnorm(300,mean=50000,sd=3000))


#Create time series using sales dollars
r = length(unique(data$DateId))
c = length(unique(data$ProductLevel3Code))
myts=ts(matrix(data$SalesNetAmount,ncol=c,nrow=r),frequency=7)

#Assign column names to matrix
clnames <- unique(paste(data$ProductLevel2Code, # PL2
                        data$ProductLevel3Code, # PL3
                        sep=""))
colnames(myts)=clnames

#Create heirarchial time series based on 4 character codes
gy=hts(myts,characters=c(2,2))

#Get total views for site by day
data.views=aggregate(data[,c("TotalViews")],by=list(data$DateId),FUN=sum)$x

#Get xreg values
z= matrix(data.views,nrow=60)

#Create newxreg values
days=14
fz = matrix(rep(mean(data.views),days),nrow=days)

fc=forecast(gy,fmethod="arima",h=days,method="bu",xreg=z,newxreg=fz)
plot(fc)

I've checked the original data and there are 42 ProductLevel3 codes, each with 60 date values. There are no NAs or missing data. The CSV has 2,520 data rows which equal 60x42. The CSV file structure is identical to the data frame created from the above code.

What am I missing??

Update

Just to try in Excel, I replaced SalesNetAmount & TotalViews with random numbers, resaved the CSV and had no issues running the R script. I tried resaving the original CSV as-is however encountered the error again. Leading me to believe the numbers are the source of my issue. Some product lines have very little sales/traffic so there are quite a bit of 0s, but I tried adding 1 to the entire data set for non-zero values and the error still exists.


Solution

  • I solved the direct question so this is technically the answer while I don't completely understand why.

    I read through the HTS code on using the trace() function and found the line causing issues:

    else if (fmethod == "arima") {
                    models <- auto.arima(x, lambda = lambda, xreg = xreg, 
                      parallel = FALSE, ...)
                    out$pfcasts <- forecast(models, h = h, xreg = newxreg)$mean
                }
    

    After some debugging, it turns out HTS was failing on a specific series. When I inspected the series it was for a product category which happened to have 0 sales for the entire 60 day period. Apparently auto.arima cannot handle external regressors when the time series is completely static.

    You can replicate this by doing

    test.data=rep(1,60)
    z=as.matrix(rnorm(60,100,20),nrow=60)
    fz=as.matrix(rnorm(14,100,20),nrow=14)
    
    #Does not work
    fit.bad=auto.arima(test.data,xreg=z)
    forecast.bad=forecast(fit.bad,h=14,xreg=fz)
    plot(forecast.bad)
    
    
    #Works
    fit.good=auto.arima(test.data)
    forecast.good=forecast(fit.good,h=14)
    plot(forecast.good)
    

    It doesn't matter what number it is, but if the time series is completely static, the first auto.arima() will give you the error

    Error in as.matrix(newxreg) %*% coefs : non-conformable arguments
    

    However changing just a single value in the time series to 2 (or any other number) will allow both ARIMA functions to work fine.

    It seems the secondary error message in my original question about

    In cbind(intercept = rep(1, n), xreg) :
      number of rows of result is not a multiple of vector length (arg 1)
    

    ...was a complete red herring added in by HTS. The source of the error lies in the auto.arima function. Removing all product series with static sales has resolved my issue.