Search code examples
rfunctionloopsforecastingarima

Developing forecasting function/loop


I am a beginner in R and appreciate any help or tip to develop a function/loop to automate forecasting process below: Here's a dummy data set

> class(stack_help) 
[1] "data.frame" 
> stack_help
    OO    GG      CC      DD
1   198.12 60.56   265.5  271.24
2   145.68 52.28   328.9  427.68
3   106.48 47.08  380.24  695.60
4    83.16 43.52  443.94  934.30
5    89.72 46.68   484.6 1084.26
6    86.48 35.46  415.56  924.68
7    93.68 24.40  376.42  798.14
8   101.70 22.68  260.42  427.72
9   115.88 22.00  228.26  245.72
10  137.24 21.60   212.7  140.64
11  129.82 18.78  230.02   46.04
12  145.00 17.62  220.74   47.16
13  135.38 18.84  245.52  143.28
14  146.38 20.68  322.18  490.20
15  154.08 19.48  374.16  621.48
16  149.34 22.68  484.28  999.50
17  152.74 28.90  533.26 1223.58
18  148.62 27.44  456.76  974.44
19  158.54 23.90  417.52  820.54
20  169.96 27.08  306.16  498.02
21  152.50 33.74   283.1  309.22
22  149.68 38.44  224.54  123.82
23  149.48 38.94  215.28   30.26
24  153.38 36.24  193.18   75.46
25  155.58 37.88  243.34  228.92
26  165.84 37.00  318.08  528.58
27  171.34 38.96   393.6  707.04
28  183.60 48.20  531.62 1169.40
29  192.58 44.46  507.96 1037.22
30  207.92 43.52     435  956.96
31  228.88 47.44  399.58  788.78
32  246.14 45.74  262.84  397.66
33  228.92 45.98   240.8  255.32
34  227.52 45.22  211.44   96.02
35  232.92 43.02  203.08   62.18
36  220.16 43.88  188.56   63.74
37  221.76 46.78  210.58  131.28
38  218.94 45.10  272.36  438.64
39  221.00 47.48  351.58  689.90
40  215.82 44.68  402.82  854.80
41  222.32 43.74  435.06 1013.92
42  239.40 52.26  474.24 1128.04
43  249.86 47.62  324.92  689.40
44  240.92 49.60  289.82  538.98
45  221.04 48.40  218.74  256.80
46  191.18 47.34  192.36  136.84
47  206.28 48.66  188.22   60.60
48  226.68 48.12  174.54   58.36
49  226.76 51.66  204.26  190.58
50  223.94 53.40  272.22  454.56
51  219.42 54.50  339.26  647.94
52  219.36 54.68 #VALUE! 1040.08
53  225.94 53.06  462.82 1066.12
54  233.04 52.64  425.32  916.22
55  218.48 64.22  438.06  961.36
56  205.76 56.44  292.24  534.28
57  206.06 53.42  225.32  272.24
58  206.22 52.50   190.2  117.16
59  215.44 52.14  182.12   32.56
60  221.92 51.10  175.82   47.50 

Appreciate any suggestions for improvement of the process below and hot to use the apply function or a loop function to automate it.

  1. The column OO is the variable I wnat to make create a forecast model with .

  2. The other columns are predictors that I want to test if the forecast works better with them or with only the past data of OO.

  3. I took 36 observations to fit an Arima model with function auto.arima from the "forecast" package .

  4. The function provides some model parameters p,d,q, , let's say 0,1,0

Now I want to test the model in an automated way and perform the below :

a. forecast the next period ahead , On the data table above would be equivalent to the row 37 .

b. take the results of the forecast and compare with the historical data , the row 37 , column OO.

c. call the accuracy function from the package "forecast" and compare with the data point row 37 . PLus , Store the error measures in a vector.

d. Update the 'xdata' argument adding the historical point 37 and also the 'xreg' argument with one more month for the predictor and call another forecast for the next period and redo this process until I complete a test of 24 forecasts.

Although I fitted the model with the package "forecast" I found easier to use the function 'sarima.for' from the package astsa.

Before the code , still more info :

  • Train.OO would be a time series object of the first 36 observations of the data table above
  • n.ahead = argument of the horizon of the forecast : 1 period in this case
  • 0,1,0 would be the ARIMA model (p,d,q)
  • Train.GG would be the predictor variable , teh first 36 observations of column GG
  • newxreg is just a cut of one data point in TS object from the data table that would be the predictor of forecast.

Now the code

fc.1 <- sarima.for( 
xdata = Train.00,    
n.ahead = 1, 0, 1, 0 , 
xreg = Train.GG, 
newxreg = window(ts(slack_help$GG, start = c(2009,1), 
frequency = 12), start = c(2012,1) , end = c(2012,11)))
fc.1                      
fc.1.acc <- accuracy(fc.1$pred, 
                 window(ts((slack_help$OO), start = c(2009,1),frequency = 
12),    start = c(2012,1), end = c(2012,1), frequency =12)                   

Now the second command :

fc.2 <- sarima.for( 
xdata = window(ts((slack_help$OO), start = c(2009,1),frequency = 12), 
start = c(2009,1), end = c(2012,1), frequency =12),  
n.ahead = 1, 0, 1, 0 , 
xreg = window(ts((slack_help$GG), start = c(2009,1),frequency = 12), 
start = c(2009,1), end = c(2012,1), frequency =12),
newxreg = window(ts((slack_help$GG), start = c(2009,1),frequency = 12), 
start = c(2009,2), end = c(2012,2), frequency =12),

fc.2
fc.2.acc <- accuracy(fc.2$pred, 
                 window(ts((slack_help$OO), start = c(2009,1),frequency = 
12),  start = c(2012,2), end = c(2012,2), frequency =12)

fc.2.acc 

And I did this for the the following forecasts. Basically the same code , Just updated the dates of the window functions to cut the right time series to be considered on the forecast.

Total 24 calls.

I know this was inefficient "brute force". However, I am a bit lost on how to start to develop the function/loop. Appreciate any comment or tip on how to automate the steps mentioned. Thanks in advance!


Solution

  • One of the easiest way to iterate through monthly time series is to use the fact that 1/12 is one month. As an example, if the data starts in January 2009, then we can make this equivalent to 2009.000. This can be extracted by using timeProp <-tsp(stack_help)[1] after making your data a ts-object (stack_help <- ts(stack_help, start=c(2009,1), freq=12)). Then February 2009 is timeProp + 1/12 = 2009.083, March 2009 is timeProp+2/12 = 2009.167. January 2011 is timeProp+24/12 = 2011.000 etc. Let's apply this:

    library(forecast)
    
    #first define the ts-object so we don't have to repeat it every time we use it
    stack_help <- ts(stack_help, start=c(2009,1), freq=12)
    
    #extract the start date
    timeProp <- tsp(stack_help)[1]
    
    #set vector for storing accuracy measures
    accuracyMeasure <- 0
    #set counter for above vector
    k <- 1
    
    #star the loop with the minimum length of data you want to use to estimate model.
    #In this case start with a length of 36 (Jan 2009 - Des 2011)
    for(i in 36:(nrow(stack_help)-1)){
    
      #create new train and test set for each iteration. This way makes your code
      #clearer and more transparent and easier to maintain in the future
      train <- window(stack_help, end=timeProp+((i-1)/12))
      test <- window(stack_help, start=timeProp+i/12, end=(timeProp+i/12))
    
      #Estimate the model. This can be changed to e.g. auto.arima(). Haven't tried
      #with sarima.for, but should be straight forward to use that as well.
      arrdarr <- Arima(train[,1], order=c(0,1,0), xreg=train[,2])
    
      #Forecast h=1 with the new xreg (h=1 is automatic since nrow(test)=1)
      foreArima <- forecast(arrdarr, xreg=test[,2])
    
      #Extract the test MAPE accuracy. 2 selects the test accuracy. "MAPE" can be changed to extract
      #others, e.g. #ME", "MASE". Remember that with h=1, you can scratch the "mean" part of the measures.
      accuracyMeasure[k] <- accuracy(foreArima, test[,1])[2,"MAPE"]
    
      k <- k+1
    }