Here is the plot of the initial data (after performing a log transformation).
It is evident there is both a linear trend as well as a seasonal trend. I can address both of these by taking the first and twelfth (seasonal) difference: diff(diff(data), 12). After doing so, here is the plot of the resulting data
This data does not look great. While the mean in constant, we see a funneling effect as time progresses. Here are the ACF/PACF:.
Any suggestions for possible fits to try. I used the auto.arima() function which suggested an ARIMA(2,0,2)xARIMA(1,0,2)(12) model. However, once I took the residuals from the fit, it was clear there was still some sort of structure in them. Here is the plot of the residuals from the fit as well as the ACF/PACF of the residuals.
There does not appear to be a seasonal pattern regarding which lags have spikes in the ACF/PACF of residuals. However, this is still something not captured by the previous steps. What do you suggest I do? How could I go about building a better model that has better model diagnostics (which at this point is just a better looking ACF and PACF)?
Here is my simplified code thus far:
library(TSA)
library(forecast)
beer <- read.csv('beer.csv', header = TRUE)
beer <- ts(beer$Production, start = c(1956, 1), frequency = 12)
# transform data
boxcox <- BoxCox.ar(beer) # 0 in confidence interval
beer.log <- log(beer)
firstDifference <- diff(diff(beer.log), 12) # get rid of linear and
# seasonal trend
acf(firstDifference)
pacf(firstDifference)
eacf(firstDifference)
plot(armasubsets(firstDifference, nar=12, nma=12))
# fitting the model
auto.arima(firstDifference, ic = 'bic') # from forecasting package
modelFit <- arima(firstDifference, order=c(1,0,0),seasonal
=list(order=c(2, 0, 0), period = 12))
# assessing model
resid <- modelFit$residuals
acf(resid, lag.max = 15)
pacf(resid, lag.max = 15)
Here is the data, if interested (I think you can use an html to csv converter if you would like): https://docs.google.com/spreadsheets/d/1S8BbNBdQFpQAiCA4J18bf7PITb8kfThorMENW-FRvW4/pubhtml
Jane,
There are a few things going on here.
Instead of logs, we used the tsay variance test which shows that the variance increased after period 118. Weighted least squares deals with it.
March becomes higher beginning at period 111. An alternative to an ar12 or seasonal differencing is to identify seasonal dummies. We found that 7 of the 12 months were unusual with a couple level shifts, an AR2 with 2 outliers.
Here is the fit and forecasts.
Note: I am a developer of the software Autobox. All models are wrong. Some are useful.
Here is Tsay's paper http://onlinelibrary.wiley.com/doi/10.1002/for.3980070102/abstract