Search code examples
rtime-seriesforecast

Simple Forecasting using Average method in R for Time series data for multiple groups


I've done forecasting and time series analysis for individual values but not for group of values in one go. I've got a historical data (36 months- 1st day of each month which I created as required by time series) for multiple groups(Model No.) in a data frame which looks like below:

ModelNo.       Month_Year      Quantity
a               2017-06-01         0
a               2017-07-01         5
a               2017-08-01         3
..              ..........         ....
..              ..........         ....
a               2020-05-01         6

b               2017-06-01         9
b               2017-07-01         0
b               2017-08-01         1
..              ..........         ....
..              ..........         ....         
b               2020-05-01         4

c               2020-05-01         3
c               2017-06-01         1
c               2017-07-01         1
c               2017-08-01         0
..              ..........         ....
..              ..........         ....         
c               2020-05-01         4 

I then use the below code to subset my data frame for "one group" to generate forecast using simple average function

Selected_data<-subset(data, ModelNo.=='a')

currentMonth<-month(Sys.Date())
currentYear<-year(Sys.Date())

I then create the time series object for 24 months which i then input to my forecast function.

y_ts = ts(Selected_data$Quantity, start=c(currentYear-3, currentMonth), end=c(currentYear-1, currentMonth-1), frequency=12)

I then use simple mean function for forecasting the 12 months value (which I already have "quantity" valuesfor , june 2019-may 2020)

 meanf(y_ts, 12, level = c(95))

and I get a output like for my data (not the output linked to above data provide, just a snapshot of my original data)

         Point Forecast     Lo 95    Hi 95
Jun 2019          1.875 -3.117887 6.867887
Jul 2019          1.875 -3.117887 6.867887
Aug 2019          1.875 -3.117887 6.867887
Sep 2019          1.875 -3.117887 6.867887
Oct 2019          1.875 -3.117887 6.867887
Nov 2019          1.875 -3.117887 6.867887
Dec 2019          1.875 -3.117887 6.867887
Jan 2020          1.875 -3.117887 6.867887
Feb 2020          1.875 -3.117887 6.867887
Mar 2020          1.875 -3.117887 6.867887
Apr 2020          1.875 -3.117887 6.867887
May 2020          1.875 -3.117887 6.867887

So I'm able to successfully generate forecast for "one" Model No. here. However, my question are :

  1. I have to generate this forecast for all groups in my dataframe, like a , b, c and so on. So I don't know how to do this and store the result in a new data frame for forecast values along with Dates for each ModelNo.

I know if i use below , that will return me the forecasted values R function meanf the output shows

meanf(y_ts, 12, level = c(95))$mean

But how to store its for each group type against dates in a dataframe, I tried mutate() it didnt work.

  1. Following on Question 1, how should I then compare the forecast values with the actual values (as you can see I only sliced 24 months data to predict 12 month values). I know there are methods in R and time series analysis where I can use multiple historical slicing test and train window and then check and compare with actual values to measure forecast results/accuracy etc. I plan to expand this to use and try multiple forecasting methods.

Please if someone can help me with the above two questions.

I believe there is a learning curve required , I know partially the process but I'm not sure how systematically I can fill this knowledge gap to use forecasting methods for multiple groups and test them against actual values. Apart from the answers to the above two questions any link to a tutorial with which I can enhance my learning will be very helpful. Thank you very much.


Solution

  • Your question(s) is rather broad, so you can start with something like this to think about how to proceed. First of all you did not provide some reproducible data, so I used what you've posted, with some tweak to your code to make it works. The idea is to do for each model a train and a test time series, create the forecast, and store it in a data.frame. Then you can calculate for example RMSE to see the goodness of fit on test.

    library(forecast)
    library(lubridate)
    
    # set date limits to train and test
     train_start <- ymd("2017-06-01")
     train_end <- ymd("2019-05-01")
    
     test_start <- ymd("2019-06-01") # end not necessary
    
    # create an empty list
    listed <- list()
    
    for (i in unique(data$ModelNo.))
                       {
                        # subset one group
                          Selected_data<-subset(data, ModelNo.==i)
                        # as ts
                          y_ts <- ts(Selected_data$Quantity,
                                     start=c(year(min(data$Month_Year)),
                                             month(max(data$Month_Year))),
                                     frequency=12)
    
                        # create train
                          train_ts <- window(y_ts, 
                                            start=c(year(train_start), month(train_start)), 
                                             end=c(year(train_end), month(train_end)), frequency = 12)
                        # create test (note: using parameters ok to your sample data)
                           test_ts <- window(y_ts, 
                                             start=c(year(test_start), month(test_start)), frequency = 12)
    
                        listed[[i]] <- cbind(
                            data.frame(meanf(train_ts,length(test_ts),level = c(95))),
                            real =as.vector(test_ts)) 
                      }
    

    Now for part 1, you can create a data.frame with the results:

    res <- do.call(rbind,listed)
    head(res) # only head to simplify output
               Point.Forecast     Lo.95    Hi.95 real
    a.Jun 2019       49.29167 -22.57528 121.1586   95
    a.Jul 2019       49.29167 -22.57528 121.1586   93
    a.Aug 2019       49.29167 -22.57528 121.1586    5
    a.Sep 2019       49.29167 -22.57528 121.1586   66
    a.Oct 2019       49.29167 -22.57528 121.1586   47
    a.Nov 2019       49.29167 -22.57528 121.1586   40
    

    For point 2, you can calculate RMSE (there is an handy function in package Metrics) for each time series:

    library(Metrics)
    goodness <- lapply(listed, function(x)rmse(x$real, x$Point.Forecast))
    goodness 
        $$a
    [1] 31.8692
    
    $b
    [1] 30.69859
    
    $c
    [1] 30.28037
    

    With data:

       set.seed(1234)
    data <- data.frame(ModelNo. = c(rep("a",36),rep("b",36),rep("c",36)),
               Month_Year = lubridate::ymd(rep(seq(as.Date("2017/6/1"), by = "month", length.out = 36),3)),
               Quantity =sample(1:100,108, replace = T)
               )