Search code examples
rplottime-seriesforecastingfable-r

Time series forecasting in R; plotting "events" and generating new forecasting plots with specified date range after initial forecast


I have created a function which allows me to carry out time series forecasting using the fable package. The idea of the function was to analyse observed vs predicted values after a particular date/event. Here is a mock data frame which generates a column of dates:-

set.seed(1)
df <- data.frame(Date = sort(sample(seq(as.Date('2018/01/01'), as.Date('2020/09/17'), by="day"),1368883, replace = T)))

And here is the function I created. You specify the data, then the date of the event, then the forecast period in days and lastly a title for your graph.

event_analysis<-function(data,eventdate,period,title){
  require(dplyr)
  require(tsibble)
  require(fable)
  require(fabletools)
  require(imputeTS)
  require(ggplot2)
  data_count<-data%>%
    group_by(Date)%>%
    summarise(Count=n())
  
  data_count<-as_tsibble(data_count)
  data_count<-na_mean(data_count)
  
  
  train <- data_count %>%
    #sample_frac(0.8)
    filter(Date<=as.Date(eventdate))
  
  fit <- train %>%
    model(
      ets = ETS(Count),
      arima = ARIMA(Count),
      snaive = SNAIVE(Count)
    ) %>%
    mutate(mixed = (ets + arima + snaive) / 3)
  
  
  fc <- fit %>% forecast(h = period)
  
  forecastplot<-fc %>%
    autoplot(data_count, level = NULL)+ggtitle(title)+
    geom_vline(xintercept = as.Date(eventdate),linetype="dashed",color="red")+
    labs(caption = "Red dashed line = Event occurrence")
                                                                 
  
  fc_accuracy<-accuracy(fc,data_count)
  
  #obs<-data_count
  #colnames(obs)[2]<-"Observed"
  #obs_pred<-merge(data_count,fc_accuracy, by="Date")
  return(list(forecastplot,fc_accuracy,fc))
}

And in one run, I specify the df, the date of the event, the number of days that I want to forecast (3 weeks), then the title:-

event_analysis(df, "2020-01-01",21,"Event forecast")

Which will print this outcome and plot:-

enter image description here

enter image description here

I concede that the mock data I made isn't totally ideal but the function works well on my real-world data.

Here is what I want to achieve. I would like this output that has been made to come out of the function, but in addition, I would like an additional graph which "zooms in" on the period that has been forecasted, for 2 reasons:-

  1. for ease of interpretation
  2. I want to be able to see the N number of days before and N number of days after the event date (N representing the forecast period i.e. 21).

So, an additional graph (along with the original full forecast) that would look like this, perhaps in the one output, "multiplot" style:-

enter image description here

The other thing would be to print another output which shows the observed values in the test set against the predicted values from the models used in the forecasting.

These are basically the two additional things I want to add to my function but I am not sure how to go about this. Any help is massively appreciated :) .


Solution

  • I suppose you could rewrite it this way. I made a couple of adjustments to help you out.

    set.seed(1)
    df <- data.frame(Date = sort(sample(seq(as.Date('2018/01/01'), as.Date('2020/09/17'), by="day"),1368883, replace = T)))
    
    event_analysis <- function(data, eventdate, period, title){
     
     # in the future, you may think to move them out
     library(dplyr)
     library(tsibble)
     library(fable)
     library(fabletools)
     library(imputeTS)
     library(ggplot2)
     
     # convert at the beginning
     eventdate <- as.Date(eventdate)
     
     # more compact sintax
     data_count <- count(data, Date, name = "Count")
     
     # better specify the date variable to avoid the message
     data_count <- as_tsibble(data_count, index = Date)
     
     # you need to complete missing dates, just in case
     data_count <- tsibble::fill_gaps(data_count)
     
     
     data_count <- na_mean(data_count)
     
     
     train <- data_count %>%
      filter(Date <= eventdate)
     
     test <- data_count %>%
      filter(Date > eventdate, Date <= (eventdate+period))
     
      fit <- train %>%
      model(
       ets    = ETS(Count),
       arima  = ARIMA(Count),
       snaive = SNAIVE(Count)
      ) %>%
      mutate(mixed = (ets + arima + snaive) / 3)
     
     
     fc <- fit %>% forecast(h = period)
     
    
     # your plot
     forecastplot <- fc %>%
      autoplot(data_count, level = NULL) + 
      ggtitle(title) +
      geom_vline(xintercept = as.Date(eventdate), linetype = "dashed", color = "red") +
      labs(caption = "Red dashed line = Event occurrence")
     
     
     # plot just forecast and test
     zoomfcstplot <- autoplot(fc) + autolayer(test, .vars = Count)
     
     fc_accuracy <- accuracy(fc,data_count)
     
    
     ### EDIT: ###
    
     # results vs test
     res <- fc %>% 
      as_tibble() %>% 
      select(-Count) %>% 
      tidyr::pivot_wider(names_from = .model, values_from = .mean) %>% 
      inner_join(test, by = "Date")
    
     ##############
     
    
     return(list(forecastplot = forecastplot,
                 zoomplot     = zoomfcstplot,
                 accuracy     = fc_accuracy,
                 forecast     = fc,
                 results      = res))
    }
    
    
    event_analysis(df, 
                   eventdate = "2020-01-01",
                   period    = 21,
                   title     = "Event forecast")
    
    

    Output:

    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    #> Carico il pacchetto richiesto: fabletools
    #> Registered S3 method overwritten by 'quantmod':
    #>   method            from
    #>   as.zoo.data.frame zoo
    #> $forecastplot
    
    

    enter image description here

    #> 
    #> $zoomplot
    

    enter image description here

    #> 
    #> $accuracy
    #> # A tibble: 4 x 9
    #>   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE    ACF1
    #>   <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
    #> 1 arima  Test  -16.8  41.8  35.2 -1.31  2.61 0.791  0.164 
    #> 2 ets    Test  -16.8  41.8  35.2 -1.31  2.61 0.791  0.164 
    #> 3 mixed  Test  -21.9  44.7  36.8 -1.68  2.73 0.825 -0.0682
    #> 4 snaive Test  -32.1  57.3  46.6 -2.43  3.45 1.05  -0.377 
    #> 
    #> $forecast
    #> # A fable: 84 x 4 [1D]
    #> # Key:     .model [4]
    #>    .model Date               Count .mean
    #>    <chr>  <date>            <dist> <dbl>
    #>  1 ets    2020-01-02 N(1383, 1505) 1383.
    #>  2 ets    2020-01-03 N(1383, 1505) 1383.
    #>  3 ets    2020-01-04 N(1383, 1505) 1383.
    #>  4 ets    2020-01-05 N(1383, 1505) 1383.
    #>  5 ets    2020-01-06 N(1383, 1505) 1383.
    #>  6 ets    2020-01-07 N(1383, 1505) 1383.
    #>  7 ets    2020-01-08 N(1383, 1505) 1383.
    #>  8 ets    2020-01-09 N(1383, 1505) 1383.
    #>  9 ets    2020-01-10 N(1383, 1505) 1383.
    #> 10 ets    2020-01-11 N(1383, 1505) 1383.
    #> # ... with 74 more rows
    #>
    #> $results
    #> # A tibble: 21 x 6
    #>    Date         ets arima snaive mixed Count
    #>    <date>     <dbl> <dbl>  <dbl> <dbl> <int>
    #>  1 2020-01-02 1383. 1383.   1386 1384.  1350
    #>  2 2020-01-03 1383. 1383.   1366 1377.  1398
    #>  3 2020-01-04 1383. 1383.   1426 1397.  1357
    #>  4 2020-01-05 1383. 1383.   1398 1388.  1415
    #>  5 2020-01-06 1383. 1383.   1431 1399.  1399
    #>  6 2020-01-07 1383. 1383.   1431 1399.  1346
    #>  7 2020-01-08 1383. 1383.   1350 1372.  1299
    #>  8 2020-01-09 1383. 1383.   1386 1384.  1303
    #>  9 2020-01-10 1383. 1383.   1366 1377.  1365
    #> 10 2020-01-11 1383. 1383.   1426 1397.  1328
    #> # ... with 11 more rows