Search code examples
rtime-seriesforecastingarima

Time series Forecasting with Daily values


I am doing forecasting with auto.Arima with uni-variate data but my forecast is not correct. I have used all the steps correctly but the point forecast value is not coming out to be right. Please help me.

Here is my data:

s <- read.csv(url('https://ondemand.websol.barchart.com/getHistory.csv?apikey=c3122f072488a29c5279680b9a2cf88e&symbol=zs*1&type=dailyNearest&backAdjust=false&startDate=20100201'))

Here is my code:

data <- s[c(3, 7)]
summary(data)
data1.ts <- zoo(data[,2], seq(from = as.Date("2010-02-01"), to = as.Date("2022-05-13"), by = 1))
autoplot(data1.ts)

Arima Model:

fit_arima <- auto.arima(data1.ts, stepwise = FALSE, approximation = FALSE, trace = TRUE)
print(summary(fit_arima)) 
checkresiduals(fit_arima)


forecast_Arima <- forecast(fit_arima, h = 1)
forecast_Arima

Foreacst Value:

      Point Forecast   Lo 80    Hi 80   Lo 95    Hi 95
19126       976.4357 949.874 1002.997 935.813 1017.058

Little update:

I have tried to load the data as a ts object and have got the accurate Point forecast value but, my forecast year is not correct. The one-step-ahead forecasting is giving me value for the year 2021 but my end date is 2022-05-13. I just want to correct the year. This is new code:

ts_soy <- ts(data[,2], start = c(2010-02-01), frequency = 214)
autoplot(ts_soy)

fit_arima <- auto.arima(ts_soy) 
print(summary(fit_arima)) 
checkresiduals(fit_arima)

forecast_Arima <- forecast(fit_arima, h = 1)
forecast_Arima


      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
2021.472         1646.5 1625.071 1667.929 1613.727 1679.273

Solution

  • I can reproduce your issue, and the reason is that your data1.ts contains too much data. You are trying to get rid of the weekends to create a continuous timeseries (aka timeseries without gaps). The principle is correct but you are exceeding the amount of records you have by 1388 records. Since R tends to recycle values, you get closing prices from the early years again and this influences the arima function.

    You can do something like create a timeseries starting from the earliest date and to is this date + the number of records - 1

    data.ts <- zoo(data[,2], seq(from = as.Date("2010-02-01"), 
                                 to = as.Date("2010-02-01") + 3096, 
                                 by = 1))
    
    fit_ar <- forecast::auto.arima(data.ts, stepwise = FALSE, approximation = FALSE)
    
    forecast::forecast(fit_ar, h = 1)
    
          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
    17738       1648.129 1626.759 1669.499 1615.446 1680.812
    

    This is one of the reasons why I prefer to use fable, I can inspect the data better.

    library(fpp3)
    library(fable.prophet)
    
    fit <- data %>% 
      mutate(id = row_number()) %>% # create index to use otherwise timeseries has gaps
      tsibble(index = id) %>%
      model(naive = NAIVE(close),
            arima = ARIMA(close, stepwise = FALSE, approximation = FALSE),
            )
    
    forecast(fit, h = 1)
    # A fable: 2 x 4 [1]
    # Key:     .model [2]
      .model    id        close .mean
      <chr>  <dbl>       <dist> <dbl>
    1 naive   3098 N(1646, 280) 1646.
    2 arima   3098 N(1649, 278) 1649.
    
    # prophet needs dates and can handle weekends
    prophet_fit <- data %>% 
      mutate(tradingDay = ymd(tradingDay)) %>%
      tsibble() %>%
      model(prophet_model = prophet(close))
    
     
    forecast(prophet_fit, h = 1)
    # A fable: 1 x 4 [1D]
    # Key:     .model [1]
      .model        tradingDay        close .mean
      <chr>         <date>           <dist> <dbl>
    1 prophet_model 2022-05-14 sample[5000] 1657.