I have daily data of dengue index from January 2010 to July 2015:
date dengue_index
1/1/2010 0.169194109
1/2/2010 0.172350434
1/3/2010 0.174939783
1/4/2010 0.176244642
1/5/2010 0.176658068
1/6/2010 0.177815751
1/7/2010 0.17893075
1/8/2010 0.1813232
1/9/2010 0.182199531
1/10/2010 0.185091158
1/11/2010 0.185267748
1/12/2010 0.185894524
1/13/2010 0.18511499
1/14/2010 0.188080728
1/15/2010 0.190019472
… …
7/20/2015 0.112748885
7/21/2015 0.113246022
7/22/2015 0.111755091
7/23/2015 0.112164176
7/24/2015 0.11429011
7/25/2015 0.113951836
7/26/2015 0.11319131
7/27/2015 0.112918734
I want to predict the values until the end of 2016 using R.
library(forecast)
setwd("...")
dengue_series <- read.csv(file="r_wikipedia-googletrends-model.csv",head=TRUE,sep=";")
dengue_index <- ts(dengue_series$dengue_index, frequency=7)
plot(dengue_index)
# lambda=0 -> predict positive values
fit <- auto.arima(dengue_index, lambda=0)
fit
# predict until December 2016
forecast_series <- forecast(fit, 500)
forecast_series
plot(forecast_series)
Problem: the prediction is not good!
How to improve the prediction?
Link to the data source: https://www.dropbox.com/s/wnvc4e78t124fkd/r_wikipedia-googletrends-model.csv?dl=0
You can try specifying as a multi-seasonal time series object msts
, and then forecasting using tbats
. tbats
is referenced in the paper that David Arenburg mentions in the comments.
Here's an example pulled from example data in the forecast
package for the taylor
dataset, which has seasonal periods of 48 half-hour periods in a day, and 336 half hour periods in a week (i.e. 336 / 48 = 7).
x <- msts(taylor, seasonal.periods=c(48,336), ts.frequency=48, start=2000+22/52)
fit <- tbats(x)
fc <- forecast(fit)
# not shown, but the forecast seems to capture both seasonal patterns
plot(fc)
Also see http://users.ox.ac.uk/~mast0315/CompareUnivLoad.pdf for additional info on taylor
For your data set with daily data and a daily/monthly seasonal pattern, perhaps
tsdat <- msts(dat, seasonal.periods=c(7, 84), ts.frequency=7, start=2010)
Or
tsdat <- msts(dat, seasonal.periods=c(7, 365.25), ts.frequency=7, start=2010)
EDIT
Using the provided data, looks like a decent forecast with daily/weekly seasonality.
data <- read.table("r_wikipedia-googletrends-model.csv", header=TRUE, sep=";")
dengue_index <- msts(data$dengue_index, seasonal.periods=c(7, 365), ts.frequency=7)
fit <- tbats(dengue_index)
fc <- forecast(fit)
plot(fc)