Currently I am doing some experiments with hyperparameter tuning for XGBoost regression on time series, using a latin hypercube sampling strategy. When running the code below, all the models fail during the tune_grid operation. The cause seems to be the recipe object. I used step_dummy() to transform the value column of my univariate time series In the .notes object appears the Error message: preprocessor 1/1: Error: unused argument (values)
I found some other post where this issue popped up, but none of the solutions helped in my case.
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(lubridate))
library(timetk)
library(tidymodels)
library(modeltime)
library(tictoc)
dates <- ymd("2016-01-01")+ months(0:59)
fake_values <-
c(64,61, 90,138,240,141,123, 9,180,95,84,69,76,104,122,183,200,268,225,
132,84,159,64,131,98,138,179,187,303,257,175,133,145,36,3,134,137,308,
84,114,310,266,123,131,87,94,86,100,105,147,159,232,312,337,285,188,257,10,98,27
)
df <- bind_cols(fake_values, dates) %>%
rename(c(values = ...1, dates = ...2)
)
# training- and test set
data_splits <- initial_time_split(df, prop = 0.8)
data_train <- training(data_splits)
data_test <- testing(data_splits)
resampling_strategy <-
data_train %>%
time_series_cv(
initial = "12 months",
assess = "3 months",
skip = "3 months",
cumulative = TRUE,
slice_limit = 3
)
# recipe
basic_rec <- recipe(values ~ ., data = data_train) %>%
step_dummy(all_nominal(values), -all_outcomes())
basic_rec %>% prep()
It looks like the problem is that those date predictors aren't getting converted to numeric values, which xgboost needs. You did use step_dummy()
but dates are not factor/nominal variables so they are not getting chosen by all_nominal()
. If you explicitly choose them, this is what happens:
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
dates <- ymd("2016-01-01") + months(0:59)
fake_values <-
c(64,61, 90,138,240,141,123, 9,180,95,84,69,76,104,122,183,200,268,225,
132,84,159,64,131,98,138,179,187,303,257,175,133,145,36,3,134,137,308,
84,114,310,266,123,131,87,94,86,100,105,147,159,232,312,337,285,188,257,10,98,27
)
df <- bind_cols(fake_values, dates) %>%
rename(c(values = ...1, dates = ...2)
)
#> New names:
#> * NA -> ...1
#> * NA -> ...2
# training- and test set
data_splits <- initial_time_split(df, prop = 0.8)
data_train <- training(data_splits)
data_test <- testing(data_splits)
basic_rec <- recipe(values ~ ., data = data_train) %>%
step_dummy(dates)
basic_rec %>% prep() %>% bake(new_data = NULL)
#> Warning: The following variables are not factor vectors and will be ignored:
#> `dates`
#> Error: The `terms` argument in `step_dummy` did not select any factor columns.
Created on 2021-10-27 by the reprex package (v2.0.1)
You probably want to handle dates with something like step_date()
.