I'm using R with the forecast version 5.4 plugin by Rob Hyndman. It's a really nice package, but it seems to be acting oddly, predicting wildly different results for similar data. I'm pretty sure it has something to do with the warning message generated at the end of the data here, but I'm not sure how to fix it.
library(forecast)
v <- vector("numeric")
v <- append(v,0.0)
v <- append(v,115.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,115.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,115.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,117.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,117.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,117.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,113.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,112.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,120.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,119.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
series <- ts(v, frequency=12)
series
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 0 115 0 0 0 115 0 0 0 115 0 0
2 0 0 117 0 0 0 117 0 0 0 117 0
3 0 0 0 0 113 0 0 0 112 0 0 0
4 0 0 0 0 120 0 0 0 119 0 0 0
a <- auto.arima(series)
// Note there is no error
v <- vector("numeric")
v <- append(v,0.0)
v <- append(v,109.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,120.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,114.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,125.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,135.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,130.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,104.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,114.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,126.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,114.0)
v <- append(v,0.0)
v <- append(v,0.0)
v <- append(v,0.0)
series <- ts(v, frequency=12)
a <- auto.arima(series)
Warning message:
In max(which(abs(testvec) 1e-08)) :
no non-missing arguments to max; returning -Inf
series
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 0 109 0 0 0 120 0 0 0 114 0 0
2 0 0 125 0 0 0 135 0 0 0 130 0
3 0 0 0 0 104 0 0 0 114 0 0 0
4 0 0 0 0 126 0 0 0 114 0 0 0
You can see that the data sets are almost identical, but the second throws a warning message. If you copy and paste into R, you can see that the forecasts for the second are off as well.
Any ideas on how to fix this?
* Update *
Note that this example was put together after about a day of R experience and is simply logger text. Using JRI (the Java interface to R), I came up with the following to simulate an ArrayList. Early prototypes aren't always the prettiest.
eval(re, "v <- vector(\"numeric\")");
for (int i = 0; i < months.size(); i++) {
eval(re, "v <- append(v," + months.get(i) + ")");
}
Here is your data again, inputted much more efficiently. Why use append
statements???
library(forecast)
v1 <- ts(c(0, 115, 0, 0, 0, 115, 0, 0, 0, 115, 0, 0, 0, 0, 117, 0, 0,
0, 117, 0, 0, 0, 117, 0, 0, 0, 0, 0, 113, 0, 0, 0, 112, 0, 0,
0, 0, 0, 0, 0, 120, 0, 0, 0, 119, 0, 0, 0), frequency=12)
fit1 <- auto.arima(v1)
plot(forecast(fit1))
v2 <- ts(c(0, 109, 0, 0, 0, 120, 0, 0, 0, 114, 0, 0, 0, 0, 125, 0, 0,
0, 135, 0, 0, 0, 130, 0, 0, 0, 0, 0, 104, 0, 0, 0, 114, 0, 0,
0, 0, 0, 0, 0, 126, 0, 0, 0, 114, 0, 0, 0), frequency=12)
fit2 <- auto.arima(v2)
plot(forecast(fit2))
The warning is coming up because auto.arima
tries to fit a model that happens to have all estimated coefficients equal to zero. The next release of the forecast package (available at https://github.com/robjhyndman/forecast) fixes this warning.
An ARIMA model is inappropriate for both of these time series in any case. Try to understand what is causing the zeros and non-zeros, and build a model that is appropriate to the data. For example, it might include two processes -- one for the time between non-zero values and one for the magnitude of the non-zero values.