Search code examples
rtime-seriesforecasting

how use ets function for time series with predictors in R


I have this dataset

dat1197=structure(list(Dates = structure(c(18993, 19024, 19052, 19083, 
19113, 19144, 19174, 19205, 19236, 19266, 19297, 19327, 19358, 
19389, 19417, 19448, 19478, 19509, 19539, 19570, 19601, 19631, 
19662, 19692, 19723), class = "Date"), total = c(290107L, 198827L, 
369809L, 328653L, 230351L, 319991L, 361509L, 263837L, 423810L, 
267680L, 195494L, 236771L, 202171L, 286674L, 313943L, 303044L, 
307096L, 170928L, 144136L, 189956L, 232079L, 201174L, 199433L, 
150333L, 195069L), conv_count = c(31L, 9414L, 10662L, 10817L, 
10544L, 10824L, 11828L, 13365L, 11795L, 12731L, 12961L, 11215L, 
16180L, 20123L, 16419L, 16190L, 17597L, 16966L, 18805L, 16072L, 
18493L, 17952L, 24781L, 25582L, 712L), unique_id_publishers = c(4270L, 
4838L, 4227L, 4628L, 4300L, 5178L, 4297L, 8440L, 7616L, 10328L, 
7959L, 6239L, 7429L, 7748L, 7189L, 6837L, 7393L, 6773L, 7028L, 
7395L, 7473L, 10730L, 8814L, 64489L, 5464L), median_seconds = c(7881.49604743083, 
7881.49604743083, 488.966666666667, 488.966666666667, 531.916666666667, 
488.966666666667, 531.916666666667, 595, 574.75, 604.25, 595, 
721.25, 595, 1000.75, 1479.5, 1196.5, 2514.5, 2324, 2642.5, 828, 
4821, 4344.5, 6468, 3941, 8822), total_forecasted = c(252179.383228222, 
211378.341678112, 298854.813540318, 297876.900653167, 298769.06537375, 
297419.968269761, 293248.366585249, 282633.709438049, 290279.426901374, 
283780.066745602, 284744.759870922, 292012.326293479, 271309.781396652, 
249031.822264103, 259416.064075342, 264210.373105608, 241258.234178068, 
246833.896200638, 234745.99587691, 268889.359224122, 208522.098966603, 
214275.525057851, 159854.631183384, 144778.271030721, 236571.818861993
)), row.names = c(NA, -25L), class = "data.frame")

I want perform time series analysis using predictors. My dependent variable is total. conv_count, unique_id_publishers, median_seconds are predictors that should explain the total variable.

I try do so . Here my code. This code iterates through the parameters to find those about which the model has the maximum R-squared

library(forecast)
library(zoo)

# Convert the dataset to data.table
dat1197 <- as.data.table(dat1197)

# Convert the Dates column to Date format
dat1197$Dates <- as.Date(paste(dat1197$Dates, "-01", sep=""))
# Create a time series without a Dates column

# Dividing the sample into training and test
train_data <- dat1197[Dates < as.Date("2023-11-01")]
test_data <- dat1197[Dates >= as.Date("2023-11-01") & Dates <= as.Date("2024-01-01")]
ts_data <- zoo(train_data[, c("total")])
# Specifying predictors
xreg <- train_data[, c("conv_count", "unique_id_publishers", "median_seconds")]
# Convert predictors to a numeric matrix
xreg_matrix <- as.matrix(xreg)

best_model <- NULL
best_r_squared <- 0

# Loop for selecting ETS model parameters with maximum R-squared
for (error in c("A", "M")) {
   for (trend in c("N", "A", "Ad", "M")) {
     for (seasonal in c("N", "A", "Ad", "M")) {
       model <- ets(ts_data, model = paste0(error, trend, seasonal), xreg = xreg_matrix)
       r_squared <- accuracy(model)$R2
       if (r_squared > best_r_squared) {
         best_model <- model
         best_r_squared <- r_squared
       }
     }
   }
}

# Obtaining forecasts for the test period
forecast_data <- forecast(best_model, xreg = as.matrix(test_data[, c("conv_count", "unique_id_publishers", "median_seconds")]), newdata = as.matrix(test_data[, c("conv_count", "unique_id_publishers ", "median_seconds")]), h = nrow(test_data))

and i get error

Error in ets(ts_data, model = paste0(error, trend, seasonal), xreg = xreg_matrix) :
   No model able to be fitted

What did i wrong and how correct perform time series using my predictors? Any help from you is valuable.


Solution

    1. ets() does not have an xreg argument. See the help files. The smooth::es() function does allow for covariates.
    2. There is no point looping over models in this way, as ets() does that internally if you don't specify the model argument.
    3. R-squared is a bad way to select a prediction model. It does not allow for model complexity, and it measures correlation rather than forecast accuracy. Imagine forecasts that are exactly half the value of the corresponding observations to see the problem.