Search code examples
rtime-seriesmoving-averagemodel-fittingautoregressive-models

Auto.arima() function does not result in white noise. How else should I go about modeling data


Here is the plot of the initial data (after performing a log transformation). log plot

It is evident there is both a linear trend as well as a seasonal trend. I can address both of these by taking the first and twelfth (seasonal) difference: diff(diff(data), 12). After doing so, here is the plot of the resulting data

first_seasonal_diff plot.

This data does not look great. While the mean in constant, we see a funneling effect as time progresses. Here are the ACF/PACF:ACFPACF.

Any suggestions for possible fits to try. I used the auto.arima() function which suggested an ARIMA(2,0,2)xARIMA(1,0,2)(12) model. However, once I took the residuals from the fit, it was clear there was still some sort of structure in them. Here is the plot of the residuals from the fit as well as the ACF/PACF of the residuals. residualPlotacfResidspacfResids

There does not appear to be a seasonal pattern regarding which lags have spikes in the ACF/PACF of residuals. However, this is still something not captured by the previous steps. What do you suggest I do? How could I go about building a better model that has better model diagnostics (which at this point is just a better looking ACF and PACF)?

Here is my simplified code thus far:

    library(TSA)
    library(forecast)
    beer <- read.csv('beer.csv', header = TRUE)
    beer <- ts(beer$Production, start = c(1956, 1), frequency = 12)

    # transform data
    boxcox <- BoxCox.ar(beer) # 0 in confidence interval
    beer.log <- log(beer)
    firstDifference <- diff(diff(beer.log), 12) # get rid of linear and 
    # seasonal trend
    acf(firstDifference)
    pacf(firstDifference)
    eacf(firstDifference)
    plot(armasubsets(firstDifference, nar=12, nma=12))

    # fitting the model
    auto.arima(firstDifference, ic = 'bic') # from forecasting package
    modelFit <- arima(firstDifference, order=c(1,0,0),seasonal
    =list(order=c(2, 0, 0), period = 12))

    # assessing model
    resid <- modelFit$residuals                     
    acf(resid, lag.max = 15)
    pacf(resid, lag.max = 15)

Here is the data, if interested (I think you can use an html to csv converter if you would like): https://docs.google.com/spreadsheets/d/1S8BbNBdQFpQAiCA4J18bf7PITb8kfThorMENW-FRvW4/pubhtml


Solution

  • Jane,

    There are a few things going on here.

    Instead of logs, we used the tsay variance test which shows that the variance increased after period 118. Weighted least squares deals with it.

    Tsay variance test

    March becomes higher beginning at period 111. An alternative to an ar12 or seasonal differencing is to identify seasonal dummies. We found that 7 of the 12 months were unusual with a couple level shifts, an AR2 with 2 outliers.

    Model

    Here is the fit and forecasts.Actual, fit and Forecasts

    Here are the residuals.Residuals

    ACF of residualsACF Residuals

    Note: I am a developer of the software Autobox. All models are wrong. Some are useful.

    Here is Tsay's paper http://onlinelibrary.wiley.com/doi/10.1002/for.3980070102/abstract