I am a beginner in R and appreciate any help or tip to develop a function/loop to automate forecasting process below: Here's a dummy data set
> class(stack_help)
[1] "data.frame"
> stack_help
OO GG CC DD
1 198.12 60.56 265.5 271.24
2 145.68 52.28 328.9 427.68
3 106.48 47.08 380.24 695.60
4 83.16 43.52 443.94 934.30
5 89.72 46.68 484.6 1084.26
6 86.48 35.46 415.56 924.68
7 93.68 24.40 376.42 798.14
8 101.70 22.68 260.42 427.72
9 115.88 22.00 228.26 245.72
10 137.24 21.60 212.7 140.64
11 129.82 18.78 230.02 46.04
12 145.00 17.62 220.74 47.16
13 135.38 18.84 245.52 143.28
14 146.38 20.68 322.18 490.20
15 154.08 19.48 374.16 621.48
16 149.34 22.68 484.28 999.50
17 152.74 28.90 533.26 1223.58
18 148.62 27.44 456.76 974.44
19 158.54 23.90 417.52 820.54
20 169.96 27.08 306.16 498.02
21 152.50 33.74 283.1 309.22
22 149.68 38.44 224.54 123.82
23 149.48 38.94 215.28 30.26
24 153.38 36.24 193.18 75.46
25 155.58 37.88 243.34 228.92
26 165.84 37.00 318.08 528.58
27 171.34 38.96 393.6 707.04
28 183.60 48.20 531.62 1169.40
29 192.58 44.46 507.96 1037.22
30 207.92 43.52 435 956.96
31 228.88 47.44 399.58 788.78
32 246.14 45.74 262.84 397.66
33 228.92 45.98 240.8 255.32
34 227.52 45.22 211.44 96.02
35 232.92 43.02 203.08 62.18
36 220.16 43.88 188.56 63.74
37 221.76 46.78 210.58 131.28
38 218.94 45.10 272.36 438.64
39 221.00 47.48 351.58 689.90
40 215.82 44.68 402.82 854.80
41 222.32 43.74 435.06 1013.92
42 239.40 52.26 474.24 1128.04
43 249.86 47.62 324.92 689.40
44 240.92 49.60 289.82 538.98
45 221.04 48.40 218.74 256.80
46 191.18 47.34 192.36 136.84
47 206.28 48.66 188.22 60.60
48 226.68 48.12 174.54 58.36
49 226.76 51.66 204.26 190.58
50 223.94 53.40 272.22 454.56
51 219.42 54.50 339.26 647.94
52 219.36 54.68 #VALUE! 1040.08
53 225.94 53.06 462.82 1066.12
54 233.04 52.64 425.32 916.22
55 218.48 64.22 438.06 961.36
56 205.76 56.44 292.24 534.28
57 206.06 53.42 225.32 272.24
58 206.22 52.50 190.2 117.16
59 215.44 52.14 182.12 32.56
60 221.92 51.10 175.82 47.50
Appreciate any suggestions for improvement of the process below and hot to use the apply function or a loop function to automate it.
The column OO is the variable I wnat to make create a forecast model with .
The other columns are predictors that I want to test if the forecast works better with them or with only the past data of OO.
I took 36 observations to fit an Arima model with function auto.arima from the "forecast" package .
The function provides some model parameters p,d,q, , let's say 0,1,0
Now I want to test the model in an automated way and perform the below :
a. forecast the next period ahead , On the data table above would be equivalent to the row 37 .
b. take the results of the forecast and compare with the historical data , the row 37 , column OO.
c. call the accuracy function from the package "forecast" and compare with the data point row 37 . PLus , Store the error measures in a vector.
d. Update the 'xdata' argument adding the historical point 37 and also the 'xreg' argument with one more month for the predictor and call another forecast for the next period and redo this process until I complete a test of 24 forecasts.
Although I fitted the model with the package "forecast" I found easier to use the function 'sarima.for' from the package astsa.
Before the code , still more info :
Now the code
fc.1 <- sarima.for(
xdata = Train.00,
n.ahead = 1, 0, 1, 0 ,
xreg = Train.GG,
newxreg = window(ts(slack_help$GG, start = c(2009,1),
frequency = 12), start = c(2012,1) , end = c(2012,11)))
fc.1
fc.1.acc <- accuracy(fc.1$pred,
window(ts((slack_help$OO), start = c(2009,1),frequency =
12), start = c(2012,1), end = c(2012,1), frequency =12)
Now the second command :
fc.2 <- sarima.for(
xdata = window(ts((slack_help$OO), start = c(2009,1),frequency = 12),
start = c(2009,1), end = c(2012,1), frequency =12),
n.ahead = 1, 0, 1, 0 ,
xreg = window(ts((slack_help$GG), start = c(2009,1),frequency = 12),
start = c(2009,1), end = c(2012,1), frequency =12),
newxreg = window(ts((slack_help$GG), start = c(2009,1),frequency = 12),
start = c(2009,2), end = c(2012,2), frequency =12),
fc.2
fc.2.acc <- accuracy(fc.2$pred,
window(ts((slack_help$OO), start = c(2009,1),frequency =
12), start = c(2012,2), end = c(2012,2), frequency =12)
fc.2.acc
And I did this for the the following forecasts. Basically the same code , Just updated the dates of the window functions to cut the right time series to be considered on the forecast.
Total 24 calls.
I know this was inefficient "brute force". However, I am a bit lost on how to start to develop the function/loop. Appreciate any comment or tip on how to automate the steps mentioned. Thanks in advance!
One of the easiest way to iterate through monthly time series is to use the fact that 1/12
is one month. As an example, if the data starts in January 2009, then we can make this equivalent to 2009.000
. This can be extracted by using timeProp <-tsp(stack_help)[1]
after making your data a ts
-object (stack_help <- ts(stack_help, start=c(2009,1), freq=12)
). Then February 2009 is timeProp + 1/12
= 2009.083
, March 2009 is timeProp+2/12
= 2009.167
. January 2011 is timeProp+24/12
= 2011.000
etc. Let's apply this:
library(forecast)
#first define the ts-object so we don't have to repeat it every time we use it
stack_help <- ts(stack_help, start=c(2009,1), freq=12)
#extract the start date
timeProp <- tsp(stack_help)[1]
#set vector for storing accuracy measures
accuracyMeasure <- 0
#set counter for above vector
k <- 1
#star the loop with the minimum length of data you want to use to estimate model.
#In this case start with a length of 36 (Jan 2009 - Des 2011)
for(i in 36:(nrow(stack_help)-1)){
#create new train and test set for each iteration. This way makes your code
#clearer and more transparent and easier to maintain in the future
train <- window(stack_help, end=timeProp+((i-1)/12))
test <- window(stack_help, start=timeProp+i/12, end=(timeProp+i/12))
#Estimate the model. This can be changed to e.g. auto.arima(). Haven't tried
#with sarima.for, but should be straight forward to use that as well.
arrdarr <- Arima(train[,1], order=c(0,1,0), xreg=train[,2])
#Forecast h=1 with the new xreg (h=1 is automatic since nrow(test)=1)
foreArima <- forecast(arrdarr, xreg=test[,2])
#Extract the test MAPE accuracy. 2 selects the test accuracy. "MAPE" can be changed to extract
#others, e.g. #ME", "MASE". Remember that with h=1, you can scratch the "mean" part of the measures.
accuracyMeasure[k] <- accuracy(foreArima, test[,1])[2,"MAPE"]
k <- k+1
}