I have a recipe with the step_mutate()
function in between, performing text data transformations on titanic dataset, supported by the stringr
package.
library(tidyverse)
library(tidymodels)
extract_title <- function(x) stringr::str_remove(str_extract(x, "Mr\\.? |Mrs\\.?|Miss\\.?|Master\\.?"), "\\.")
rf_recipe <-
recipe(Survived ~ ., data = titanic_train) %>%
step_impute_mode(Embarked) %>%
step_mutate(Cabin = if_else(is.na(Cabin), "Yes", "No"),
Title = if_else(is.na(extract_title(Name)), "Other", extract_title(Name))) %>%
step_impute_knn(Age, impute_with = c("Title", "Sex", "SibSp", "Parch")) %>%
update_role(PassengerId, Name, new_role = "id")
This set of transformations works perfectly well with rf_recipe %>% prep() %>% bake(new_data = NULL)
.
When I try to fit a random forests model with hyperparameter tunning and 10-fold cross validation within a workflow, all models fail. The output of the .notes columns explicitly says that there was a problem with mutate()
column Title
: couldn't find the function str_remove()
.
doParallel::registerDoParallel()
rf_res <-
tune_grid(
rf_wf,
resamples = titanic_folds,
grid = rf_grid,
control = control_resamples(save_pred = TRUE)
)
As this post suggests I've explicitly told R that str_remove should be found in stringr package. Why this isn't working and what could be causing it?
The error shows up because step_knn_impute()
and subsequently the gower::gower_topn
function transforms all characters to factors. To overcome this issue I had to apply prep()
and bake()
functions, without the inclusion of the recipe in the workflow.
prep_recipe <- prep(rf_recipe)
train_processed <- bake(prep_recipe, new_data = NULL)
test_processed <- bake(prep_recipe, new_data = titanic_test %>%
mutate(across(where(is.character), as.factor)))
Now the models converge.