Search code examples
rforecasting

Forecasting in R - ARIMA, TBATS, UCM, Bayesian Structural time series etc


I am 2 months old to forecasting concepts but i am trying by myself to learn and keep practicing. Here I am trying to forecast weekly product movement using different forecasting techniques on training data set and testing its accuracy on test data set. I have tried different techniques like ARIMA, TBATS, Holts Winter, UCM, Bayesian Structural time series etc. But not able to improve my accuracy. Accuracy seems to be very bad. Not sure where i am going wrong. I also tried ARIMA with regressor but again it is not helping me much. I am not sure if my codes or my approach is wrong. Can anyone please guide me to improve my accuracy? Below is the weekly data set (starts from 8th Dec, 2012)

  [1]  74  76  78  63  58  58  57  56  85  73  71  91  85
 [14]  79 101  74  86  98 131  90 127 116 320 145 121 148
 [27] 112 141 153 118 151 151 152  90 147 123 266  99 110
 [40] 146 134  76  81 100  80 323  15  22  14  13  19  56
 [53]  78  79  70  79  24  26  31  35  45  33  41  41  61
 [66]  91  83  76  57  68  87  82 105  76 107 116 105 124
 [79] 127 149 124 120 111 122 134  87  80  81  89  40  63
 [92] 112  85 131  97  51  65  74  70  47  62  60  49  47
[105]  56  64  57  58  45  56  60  49  82  49  61  71  61
[118]  92  90  75  69 114  79 144 121 133 132 114 124 152
[131] 125 112 128 124 152  95  64  59  91 132 146 120 196
[144] 212 115 125  66  68  78  83  74 300  46  98  86  95
[157]  61  73  89  56  81  60  58 101 482  55 124  72  57
[170]  51  82  55  68 105 153 113 105  85  34  77  95  96
[183]  97  94  81 104  76  97  65  42  18  11

I have considered my training period as 178 weeks and testing as 14 weeks. Lets say, 'data' is my dataframe with "units" as my colname,

series    <- ts(data, start=2012+342/365.25, frequency = 365.25/7)
kk        <- 178
seas      <- 365.25/7
st        <- tsp(series)[1] + (1/seas)*(kk-1)
training  <- window(series, end = st)
testing   <- window(series, start = st + 1/52.17857, end = st+14/52.17857)

train1 <- training[,"units"]
test1  <- testing[,"units"]

##ARIMA
farima    <- forecast(auto.arima(train1),h=14)
acc_arima <- accuracy(farima$mean,test1)

##TBATS
fTBATS    <- forecast(tbats(train1,seasonal.periods=c(4,7,12,52)), h=14)
acc_TBATS <- accuracy(fTBATS$mean,test1)

##struTs
fstruTs    <- forecast(StructTS(train1), h=14)
acc_struTs <- accuracy(fstruTs$mean,test1)

##UCM
forUCM     <- ucm(formula = train1~0, data = train1, level =     
TRUE, slope = TRUE)
fUCM       <- predict(forUCM$model, n.ahead = 14)
acc_struTs <- accuracy(fUCM$fit,test1)

##Bayesian Structural time series
ss <- AddLocalLinearTrend(list(), train1)
ss <- AddSeasonal(ss, train1, nseasons = 52, season.duration = 7)

model2 <- bsts(train1, state.specification = ss, niter = 500)
fbsts <- predict(model2, horizon = 14, burn = 100)
acc_bsts <- accuracy(fbsts$mean,test1)

For all the above methods, my MAPE is above 100% which i think is very bad. Can someone please guide me to improve the accuracy? I will be very obliged. Thanks!


Solution

  • I would recommend a few things:

    1) If you are using the excellent R forecast package, I would recommend at least trying the fully automated forecast (see examples below).

    2) I would recommend plotting the forecast and actual values, along with the historic data to see if the output seems reasonable given the historic data.

    3) I would recommend reading the free on-line textbook made by some of the creators of the R forecast package.

    The example below uses the fully automated time series forecast from the forecast package and plots the results, both for the data-set you're using above, and another publicly available data-set.

    library(ggplot2)
    library(forecast)
    
    data <- read.table("./data.txt", quote="\"", comment.char="")
    series <- ts(as.numeric(data), start=2012+342/365.25, frequency = 365.25/7)
    
    train_length <- 178
    test_length <- length(series) - train_length
    train_end <- time(series)[train_length]
    test_start <- time(series)[train_length+1]
    
    training <- window(series, end = train_end)
    testing <- window(series, start = test_start)
    
    ## Use default forecast
    fcast <- forecast(training, h=test_length)
    plot(fcast)
    lines(testing, col='red')
    acc_fcast <- accuracy(fcast$mean, testing)
    
    births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
    birthstimeseries <- ts(births, frequency=12, start=c(1946,1))
    train_length <- 150
    test_length <- length(birthstimeseries) - train_length
    train_end <- time(birthstimeseries)[train_length]
    test_start <- time(birthstimeseries)[train_length+1]
    
    training <- window(birthstimeseries, end = train_end)
    testing <- window(birthstimeseries, start = test_start)
    
    ## Use default forecast
    fcast <- forecast(training, h=test_length)
    plot(fcast)
    lines(testing, col='red')
    acc_births <- accuracy(fcast$mean, testing)