Search code examples
rtime-seriesarimaets

Problems Interpreting fitted values from ETS() and AUTO.ARIMA() models in R


I'm stuck into this question that I can't solve.

When using AirPassengers data and model it through ETS() and AUTO.ARIMA(), the fitted values seems reasonable well fitted to observed values:

library(forecast)

a <- ts(AirPassengers, start = 1949, frequency = 12)
a <- window(a, start = 1949, end = c(1954,12), frequency = 12)

fit_a_ets <- ets(a)
fit_a_arima <- auto.arima(a)

plot(a)
lines(fit_a_ets$fitted, col = "blue")
lines(fit_a_arima$fitted, col = "red")

Plot from AirPassengers and fitted models

When I tried same code on my data, it seems dislocated 1 period:

b <- c(1237,1982,1191,1163,1418,1687,2331,2181,1943,1782,177,1871,391,1397,734,712,1006,508,368,767,675,701,989,725,1292,983,1094,1105,928,1246,1604,1163,1390,959,1630,789,1173,910,875,718,655,606,968,716,476,476,655,499,544,1250,359,386,458,947,542,953,1450,1195,1317,957,778,1030,1399,1119,3142,1024,1537,1321,2062,1897,2094,2546,1796,2089,1194,896,727,599,785,674,828,311,375,315,365,314,126,315,372,666,596,589,001,613,498,635,644,1018,873,900,502,121,293,259,311,169,378,153,24,115,250,565,349,201,393,83,327,325,185,307,501,194)
b <- ts(b, start = 1949, frequency = 12)
b <- window(b, start = 1949, end = c(1954,12), frequency = 12)

fit_b_ets <- ets(b)
fit_b_arima <- auto.arima(b)

plot(b)
lines(fit_b_ets$fitted, col = "blue")
lines(fit_b_arima$fitted, col = "red")

Plot from my data and fitted models

Does anyone know why?

Tried here https://otexts.com/fpp2/index.html and I didn't get why this happens.

I thought it would be because it's not well fitted into my data, but for others set's of data, same occurs. For example, figure 7.1 from https://otexts.com/fpp2/ses.html.


Solution

  • This is typical.

    In the context of forecasting, the "fitted" value is the one-step-ahead forecast. For many different types of series, the best that we can do is something that's close to the latest observation, plus a small adjustment. This makes it look like the "fitted" value lags by 1 period because it is then usually quite close to the previous observed value.

    Asking why the fitted series lags is like asking "why can't we know the future before it happens?". It's simply not that easy, and it doesn't indicate that the model is necessarily inadequate (it may not be possible to do better).

    Plots comparing the time series of observations and fitted values are rarely of any use for forecasting; they always essentially look like this. It also makes it difficult to judge the vertical distance between the lines, which is what you actually care about (the forecasting error). Better to plot the forecasting error directly.

    The AirPassengers series is unusual because it is extremely easy to forecast based on its seasonality. Most series you will encounter in the wild are not quite like this.