I am 2 months old to forecasting concepts but i am trying by myself to learn and keep practicing. Here I am trying to forecast weekly product movement using different forecasting techniques on training data set and testing its accuracy on test data set. I have tried different techniques like ARIMA, TBATS, Holts Winter, UCM, Bayesian Structural time series etc. But not able to improve my accuracy. Accuracy seems to be very bad. Not sure where i am going wrong. I also tried ARIMA with regressor but again it is not helping me much. I am not sure if my codes or my approach is wrong. Can anyone please guide me to improve my accuracy? Below is the weekly data set (starts from 8th Dec, 2012)
[1] 74 76 78 63 58 58 57 56 85 73 71 91 85
[14] 79 101 74 86 98 131 90 127 116 320 145 121 148
[27] 112 141 153 118 151 151 152 90 147 123 266 99 110
[40] 146 134 76 81 100 80 323 15 22 14 13 19 56
[53] 78 79 70 79 24 26 31 35 45 33 41 41 61
[66] 91 83 76 57 68 87 82 105 76 107 116 105 124
[79] 127 149 124 120 111 122 134 87 80 81 89 40 63
[92] 112 85 131 97 51 65 74 70 47 62 60 49 47
[105] 56 64 57 58 45 56 60 49 82 49 61 71 61
[118] 92 90 75 69 114 79 144 121 133 132 114 124 152
[131] 125 112 128 124 152 95 64 59 91 132 146 120 196
[144] 212 115 125 66 68 78 83 74 300 46 98 86 95
[157] 61 73 89 56 81 60 58 101 482 55 124 72 57
[170] 51 82 55 68 105 153 113 105 85 34 77 95 96
[183] 97 94 81 104 76 97 65 42 18 11
I have considered my training period as 178 weeks and testing as 14 weeks. Lets say, 'data' is my dataframe with "units" as my colname,
series <- ts(data, start=2012+342/365.25, frequency = 365.25/7)
kk <- 178
seas <- 365.25/7
st <- tsp(series)[1] + (1/seas)*(kk-1)
training <- window(series, end = st)
testing <- window(series, start = st + 1/52.17857, end = st+14/52.17857)
train1 <- training[,"units"]
test1 <- testing[,"units"]
##ARIMA
farima <- forecast(auto.arima(train1),h=14)
acc_arima <- accuracy(farima$mean,test1)
##TBATS
fTBATS <- forecast(tbats(train1,seasonal.periods=c(4,7,12,52)), h=14)
acc_TBATS <- accuracy(fTBATS$mean,test1)
##struTs
fstruTs <- forecast(StructTS(train1), h=14)
acc_struTs <- accuracy(fstruTs$mean,test1)
##UCM
forUCM <- ucm(formula = train1~0, data = train1, level =
TRUE, slope = TRUE)
fUCM <- predict(forUCM$model, n.ahead = 14)
acc_struTs <- accuracy(fUCM$fit,test1)
##Bayesian Structural time series
ss <- AddLocalLinearTrend(list(), train1)
ss <- AddSeasonal(ss, train1, nseasons = 52, season.duration = 7)
model2 <- bsts(train1, state.specification = ss, niter = 500)
fbsts <- predict(model2, horizon = 14, burn = 100)
acc_bsts <- accuracy(fbsts$mean,test1)
For all the above methods, my MAPE is above 100% which i think is very bad. Can someone please guide me to improve the accuracy? I will be very obliged. Thanks!
I would recommend a few things:
1) If you are using the excellent R forecast
package, I would recommend at least trying the fully automated forecast (see examples below).
2) I would recommend plotting the forecast and actual values, along with the historic data to see if the output seems reasonable given the historic data.
3) I would recommend reading the free on-line textbook made by some of the creators of the R forecast
package.
The example below uses the fully automated time series forecast from the forecast
package and plots the results, both for the data-set you're using above, and another publicly available data-set.
library(ggplot2)
library(forecast)
data <- read.table("./data.txt", quote="\"", comment.char="")
series <- ts(as.numeric(data), start=2012+342/365.25, frequency = 365.25/7)
train_length <- 178
test_length <- length(series) - train_length
train_end <- time(series)[train_length]
test_start <- time(series)[train_length+1]
training <- window(series, end = train_end)
testing <- window(series, start = test_start)
## Use default forecast
fcast <- forecast(training, h=test_length)
plot(fcast)
lines(testing, col='red')
acc_fcast <- accuracy(fcast$mean, testing)
births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
birthstimeseries <- ts(births, frequency=12, start=c(1946,1))
train_length <- 150
test_length <- length(birthstimeseries) - train_length
train_end <- time(birthstimeseries)[train_length]
test_start <- time(birthstimeseries)[train_length+1]
training <- window(birthstimeseries, end = train_end)
testing <- window(birthstimeseries, start = test_start)
## Use default forecast
fcast <- forecast(training, h=test_length)
plot(fcast)
lines(testing, col='red')
acc_births <- accuracy(fcast$mean, testing)