Here is the model:
cb_spec <- boost_tree(
mode = "classification",
trees = 1000,
tree_depth = tune(),
min_n = tune(),
mtry = tune(),
learn_rate = tune()
) %>%
set_engine("catboost", loss_function = "Logloss", task_type = "GPU")
Here is the recipe:
cb_rec <- recipe(covid_vaccination ~ ., data = cb_train) %>%
step_unknown(all_nominal_predictors()) %>%
#step_dummy(all_nominal_predictors(), one_hot = TRUE) %>%
step_impute_median(all_numeric_predictors()) %>%
I combine them:
cb_wf <- workflow() %>%
add_model(cb_spec) %>%
Then I try to tune to find optimal hyperparameters:
cb_tune <- tune_grid(
object = cb_wf,
resamples = cb_folds,
grid = cb_grid,
metrics = metric_set(roc_auc),
control = control_grid(verbose = TRUE)
Here is the error I get:
Error in catboost.from_matrix(as.matrix(float_and_cat_features_data), : Unsupported label type, expecting double or integer.
I have already confirmed that categorical variables are changed to factors. There are absolutely no character type vectors in my dataset.
Solved thanks to someone awesome who made their own fork of treesnip as a workaround: Mikhail Rudakov