Search code examples
rforecasting

Error in arima of R: too few non-missing observations


I am using arima() and auto.arima() of R to get the prediction of sales. The data is at week level for three years.

my code looks like:

x<-c(1571,1501,895,1335,2306,930,2850,1380,975,1080,990,765,615,585,838,555,1449,615,705,465,165,630,330,825,555,720,615,360,765,1080,825,525,885,507,884,1230,342,615,1161, 1585,723,390,690,993,1025,1515,903,990,1510,1638,1461.67,1082,1075,2315,1014,2140,1572,794,1363,1184,1248,1344,1056,816,720,896,608,624,560,512,304,640,640,704,1072,768, 816,640,272,1168,736,1003,864,658.67,768,841,1727,944,848,432,704,850.67,1205,592,1104,976,629,814,1626,933.33,1100.33,1730,2742,1552,1038,826,1888,1440,1372,824,1824,1392,1424,768,464, 960,320,384,512,478,1488,384,338.67,176,624,464,528,592,288,544,418.67,336,752,400,1232,477.67,416,810.67,1256,1040,823,240,1422,704,718,1193,1541,1008,640,752, 1008,864,1507,4123,2176,899,1717,935)

length_data<-length(x)

length_train<-round(length_data*0.80)

forecast_period<-length_data-length_train

train_data<-x[1:length_train]

train_data<-ts(train_data,frequency=52,start=c(1,1))

validation_data<-x[(length_train+1):length_data]

validation_data<-ts(validation_data,frequency=52,start=c(ceiling((length_train)/52),((length_train)%%52+1)))

arima_output<-auto.arima(train_data) # fit the ARIMA Model

arima_validate <- Arima(x=validation_data,model=arima_output)

Error:

Error in stats::arima(x = x, order = order, seasonal = seasonal, include.mean = include.mean, :

too few non-missing observations

What I am doing wrong? What does it mean by "too few non-missing observations"? I have searched it now net, but did not get any better explanation.

Thanks for any kind of help!


Solution

  • arima_output is a seasonal ARIMA model:

    > arima_output
    Series: train_data 
    ARIMA(1,0,1)(0,1,0)[52]
    

    Arima() then attempts to refit this particular model to validation_data. But to fit a seasonal model to a time series, you need at least one full year of observations, since seasonal ARIMA depends on seasonal differencing.

    As an illustration, note that Arima() will happily and without errors refit a time series that is double as long as validation_data:

    validation_data <- x[(length_train+1):length_data]
    validation_data<-ts(rep(validation_data,2),frequency=52,
      start=c(ceiling((length_train)/52),((length_train)%%52+1)))
    arima_validate <- Arima(x=validation_data,model=arima_output)
    

    One way of dealing with this would be to force auto.arima() to use a nonseasonal model, by specifying D=0:

    validation_data <- x[(length_train+1):length_data]
    validation_data<-ts(validation_data,frequency=52,
      start=c(ceiling((length_train)/52),((length_train)%%52+1)))
    arima_output<-auto.arima(train_data, D=0) # fit the ARIMA Model
    arima_validate <- Arima(x=validation_data,model=arima_output)
    

    So this did turn out to be more of a CrossValidated question...