Search code examples
rfeature-selectiontidymodelsr-recipes

How to prep a recipe, including tunable arguments?


As you can see from my code, I am trying to include feature selection into my tidymodels workflow. I am using some kaggle data, trying to predict customer churn.

In order to apply processing to test and training data, I am baking the recipe after I am using the the prep() function.

However, if I want to apply tuning for the step_select_roc() functions top_p argument, I do not know, how to prep() the recipe afterwards. Applying it as in my reprex, results in an error.

Maybe I have to adapt my workflow and separate some recipe tasks to get the job done. What is the best approach to achieve this?

#### LIBS

suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(themis))
suppressPackageStartupMessages(library(recipeselectors))


#### INPUT

# get dataset from: https://www.kaggle.com/shrutimechlearn/churn-modelling
data <- fread("Churn_Modelling.csv")


# split data
set.seed(seed = 1972) 
train_test_split <-
  rsample::initial_split(
    data = data,     
    prop = 0.80   
  ) 
train_tbl <- train_test_split %>% training() 
test_tbl  <- train_test_split %>% testing() 


#### FEATURE ENGINEERING

# Define the recipe
recipe <- recipe(Exited ~ ., data = train_tbl) %>%
  step_rm(one_of("RowNumber", "Surname")) %>%
  update_role(CustomerId, new_role = "Helper") %>%
  step_num2factor(all_outcomes(),
                  levels = c("No", "Yes"),
                  transform = function(x) {x + 1}) %>%
  step_normalize(all_numeric(), -has_role(match = "Helper")) %>%
  step_dummy(all_nominal(), -all_outcomes()) %>%
  step_corr(all_numeric(), -has_role("Helper")) %>%
  step_nzv(all_predictors()) %>%
  step_select_roc(all_predictors(), outcome = "Exited", top_p = tune()) %>%  
  prep()


# Bake it
train_baked <- recipe %>%  bake(train_tbl)
test_baked <- recipe %>% bake(test_tbl) 

Solution

  • You can't prep() a recipe that has tuneable arguments. Think of prep() as an analogy for fit() for a model; you can't fit a model if you haven't set the hyperparameters.

    library(recipes)
    #> Loading required package: dplyr
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    #> 
    #> Attaching package: 'recipes'
    #> The following object is masked from 'package:stats':
    #> 
    #>     step
    
    rec <- recipe( ~ ., data = USArrests) %>%
      step_normalize(all_numeric()) %>%
      step_pca(all_numeric(), num_comp = tune::tune())
    
    prep(rec, training = USArrests)
    #> Error in `prep()`:
    #> ! You cannot `prep()` a tuneable recipe. Argument(s) with `tune()`: 'num_comp'. Do you want to use a tuning function such as `tune_grid()`?
    

    Created on 2022-02-22 by the reprex package (v2.0.1)