Search code examples
rdplyrtidymodelsr-recipes

step_mutate() couldn't find the function str_remove()


I have a recipe with the step_mutate() function in between, performing text data transformations on titanic dataset, supported by the stringr package.

library(tidyverse)
library(tidymodels)

extract_title <- function(x) stringr::str_remove(str_extract(x, "Mr\\.? |Mrs\\.?|Miss\\.?|Master\\.?"), "\\.")

rf_recipe <- 
  recipe(Survived ~ ., data = titanic_train) %>% 
  step_impute_mode(Embarked) %>% 
  step_mutate(Cabin = if_else(is.na(Cabin), "Yes", "No"),
              Title = if_else(is.na(extract_title(Name)), "Other", extract_title(Name))) %>% 
  step_impute_knn(Age, impute_with = c("Title", "Sex", "SibSp", "Parch")) %>% 
  update_role(PassengerId, Name, new_role = "id")

This set of transformations works perfectly well with rf_recipe %>% prep() %>% bake(new_data = NULL).

When I try to fit a random forests model with hyperparameter tunning and 10-fold cross validation within a workflow, all models fail. The output of the .notes columns explicitly says that there was a problem with mutate() column Title: couldn't find the function str_remove().

doParallel::registerDoParallel()
rf_res <- 
  tune_grid(
    rf_wf,
    resamples = titanic_folds,
    grid = rf_grid,
    control = control_resamples(save_pred = TRUE)
  )

As this post suggests I've explicitly told R that str_remove should be found in stringr package. Why this isn't working and what could be causing it?


Solution

  • The error shows up because step_knn_impute() and subsequently the gower::gower_topn function transforms all characters to factors. To overcome this issue I had to apply prep()and bake() functions, without the inclusion of the recipe in the workflow.

    prep_recipe <- prep(rf_recipe)  
    train_processed <- bake(prep_recipe, new_data = NULL)
    test_processed <- bake(prep_recipe, new_data = titanic_test %>%
                             mutate(across(where(is.character), as.factor)))
    

    Now the models converge.