Search code examples
rtime-seriesxgboostforecastingrolling-average

Use Rolling Mean as historic data for multiple future data points in Rstudio


I'm building a multifactorial sales forecasting model in R studio using xgBoost regression. I have built up lags with this function

Create_lags <- function(MyData, start_index_lag, num_lags) {
  lags =seq(from =start_index_lag, to=start_index_lag+num_lags)
  lag_names <- paste("lag", formatC(lags,width = nchar(max(lags)), flag="0"),
                     sep="_")
  lag_functions <- setNames(paste("dplyr::lag(.,",lags,")"), lag_names)
  print(lag_functions)
  MyData= MyData %>%
    arrange(Channel, Product)%>%
    group_by(Channel, Product)%>%
    mutate_at(vars(Sales), funs_(lag_functions))
  print(colnames(MyData))
  return(MyData)
}

and this works fine but then I have also built up rolling means and standard deviation with the below:

Create_rolling_window_means <- function(MyData,start_index_rollfeat, num_rollfeat){
  rollmean_1 = seq(from =start_index_rollfeat, to= start_index_rollfeat+num_rollfeat)
  rollmean_names <- paste("rollmean", formatC(rollmean_1,
                                              
                                              width=nchar(max(rollmean_1)),flag="0"),
                          sep="")
  rollmean_functions <- setNames(paste("lag(roll_meanr(.,",rollmean_1,")",",1)"), rollmean_names)
  print(rollmean_functions)
  MyData= MyData %>%
    arrange(Channel, Product)%>%
    group_by(Channel, Product)%>%
    mutate_at(vars(Sales), funs_(rollmean_functions))
  print(colnames(MyData))
  return(MyData)
}

Create_rolling_window_sd <- function(MyData, start_index_rollfeat, num_rollfeat){
  rollsd_1 = seq(from =start_index_rollfeat, to= start_index_rollfeat+num_rollfeat)
  rollsd_names <- paste("rollsd", formatC(rollsd_1,
                                          
                                          width=nchar(max(rollsd_1)),flag="0"),
                        sep="")
  rollsd_functions <- setNames(paste("lag(roll_sdr(.,",rollsd_1,")",",1)"), rollsd_names)
  print(rollsd_functions)
  MyData= MyData %>%
    arrange(Channel, Product)%>%
    group_by(Channel, Product)%>%
    mutate_at(vars(Sales), funs_(rollsd_functions))
  print(colnames(MyData))
  return(MyData)
}

this is working fine just for one future data point but I'm in the below situation, excel example rolling mean in 3 periodsenter image description here

so I can predict just one future data point, so what I think I need is to fix the function in order to use the predicted rolling mean as historic data, when I don't have the actual historic data point, in a loop, in order to fill up 45 future data points (45 days), something like the example belowenter image description here

My final result should be a unique column filled up with the values coming from the last column (exactly the same would be for standard deviation), which then I can use as a variable in my model. Just for additional context I'm using those values:

start_index_lag=4
num_lags=60
start_index_rollfeat=4
num_rollfeat=60
forecast_horizon = 45 #45 days

Solution

  • I achieved to make it recursive with dates[i]

    Dates =seq(max(train$Date), by="day", length.out=45)
    Dates
    i=1
    for (i in 2: length(Dates)) {
      
       df_test <- MyDataTotal %>%
         filter(Date <= Dates[i])%>%
         group_by(Channel, Product)
       
       #%>%
         #filter(n() >13)}## to avoid items that are not enough size
       
       #build the feature engineering for the unseen weeks
    test_1 =df_test %>%
         Create_AR_MA_feats(., start_index_lag, num_lags)
                            #, start_index_rollfeat, num_rollfeat)
       #filter the unseen day features
       test_Final = test_1[test_1$Date ==(Dates[i]),]