Here is the model:
cb_spec <- boost_tree(
mode = "classification",
trees = 1000,
tree_depth = tune(),
min_n = tune(),
mtry = tune(),
learn_rate = tune()
) %>%
set_engine("catboost", loss_function = "Logloss", task_type = "GPU")
Here is the recipe:
cb_rec <- recipe(covid_vaccination ~ ., data = cb_train) %>%
step_unknown(all_nominal_predictors()) %>%
#step_dummy(all_nominal_predictors(), one_hot = TRUE) %>%
step_impute_median(all_numeric_predictors()) %>%
step_nzv(all_predictors())
I combine them:
cb_wf <- workflow() %>%
add_model(cb_spec) %>%
add_recipe(cb_rec)
Then I try to tune to find optimal hyperparameters:
cb_tune <- tune_grid(
object = cb_wf,
resamples = cb_folds,
grid = cb_grid,
metrics = metric_set(roc_auc),
control = control_grid(verbose = TRUE)
)
Here is the error I get:
Error in catboost.from_matrix(as.matrix(float_and_cat_features_data), : Unsupported label type, expecting double or integer. I have already confirmed that categorical variables are changed to factors. There are absolutely no character type vectors in my dataset.
This was confirmed to be a weird error when using tidymodels for catboost. Check their github issues for more info and a current workaround.