Recipe for XGBoost tidymodels. Error: unused argument (values)

Currently I am doing some experiments with hyperparameter tuning for XGBoost regression on time series, using a latin hypercube sampling strategy. When running the code below, all the models fail during the tune_grid operation. The cause seems to be the recipe object. I used step_dummy() to transform the value column of my univariate time series In the .notes object appears the Error message: preprocessor 1/1: Error: unused argument (values)

I found some other post where this issue popped up, but none of the solutions helped in my case.

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(lubridate))
library(timetk)
library(tidymodels)
library(modeltime)
library(tictoc)


dates <- ymd("2016-01-01")+ months(0:59)
fake_values <- 
  c(64,61, 90,138,240,141,123, 9,180,95,84,69,76,104,122,183,200,268,225,
    132,84,159,64,131,98,138,179,187,303,257,175,133,145,36,3,134,137,308,
    84,114,310,266,123,131,87,94,86,100,105,147,159,232,312,337,285,188,257,10,98,27
  )
df <- bind_cols(fake_values, dates) %>% 
  rename(c(values = ...1, dates = ...2)
  )

# training- and test set
data_splits <- initial_time_split(df, prop = 0.8)
data_train  <- training(data_splits)
data_test   <- testing(data_splits)

resampling_strategy <- 
  data_train %>%
  time_series_cv(
    initial = "12 months",
    assess = "3 months",
    skip = "3 months",
    cumulative  = TRUE,
    slice_limit = 3
)

# recipe
basic_rec <- recipe(values ~ ., data = data_train)  %>% 
  step_dummy(all_nominal(values), -all_outcomes()) 

basic_rec %>% prep()

Solution

It looks like the problem is that those date predictors aren't getting converted to numeric values, which xgboost needs. You did use step_dummy() but dates are not factor/nominal variables so they are not getting chosen by all_nominal(). If you explicitly choose them, this is what happens:

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

dates <- ymd("2016-01-01") + months(0:59)
fake_values <- 
  c(64,61, 90,138,240,141,123, 9,180,95,84,69,76,104,122,183,200,268,225,
    132,84,159,64,131,98,138,179,187,303,257,175,133,145,36,3,134,137,308,
    84,114,310,266,123,131,87,94,86,100,105,147,159,232,312,337,285,188,257,10,98,27
  )
df <- bind_cols(fake_values, dates) %>% 
  rename(c(values = ...1, dates = ...2)
  )
#> New names:
#> * NA -> ...1
#> * NA -> ...2

# training- and test set
data_splits <- initial_time_split(df, prop = 0.8)
data_train  <- training(data_splits)
data_test   <- testing(data_splits)

basic_rec <- recipe(values ~ ., data = data_train) %>% 
  step_dummy(dates) 

basic_rec %>% prep() %>% bake(new_data = NULL)
#> Warning: The following variables are not factor vectors and will be ignored:
#> `dates`
#> Error: The `terms` argument in `step_dummy` did not select any factor columns.

^{Created on 2021-10-27 by the reprex package (v2.0.1)}

You probably want to handle dates with something like step_date().