r feature-selection tidymodels r-recipes

How to prep a recipe, including tunable arguments?

As you can see from my code, I am trying to include feature selection into my tidymodels workflow. I am using some kaggle data, trying to predict customer churn.

In order to apply processing to test and training data, I am baking the recipe after I am using the the prep() function.

However, if I want to apply tuning for the step_select_roc() functions top_p argument, I do not know, how to prep() the recipe afterwards. Applying it as in my reprex, results in an error.

Maybe I have to adapt my workflow and separate some recipe tasks to get the job done. What is the best approach to achieve this?

#### LIBS

suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(themis))
suppressPackageStartupMessages(library(recipeselectors))


#### INPUT

# get dataset from: https://www.kaggle.com/shrutimechlearn/churn-modelling
data <- fread("Churn_Modelling.csv")


# split data
set.seed(seed = 1972) 
train_test_split <-
  rsample::initial_split(
    data = data,     
    prop = 0.80   
  ) 
train_tbl <- train_test_split %>% training() 
test_tbl  <- train_test_split %>% testing() 


#### FEATURE ENGINEERING

# Define the recipe
recipe <- recipe(Exited ~ ., data = train_tbl) %>%
  step_rm(one_of("RowNumber", "Surname")) %>%
  update_role(CustomerId, new_role = "Helper") %>%
  step_num2factor(all_outcomes(),
                  levels = c("No", "Yes"),
                  transform = function(x) {x + 1}) %>%
  step_normalize(all_numeric(), -has_role(match = "Helper")) %>%
  step_dummy(all_nominal(), -all_outcomes()) %>%
  step_corr(all_numeric(), -has_role("Helper")) %>%
  step_nzv(all_predictors()) %>%
  step_select_roc(all_predictors(), outcome = "Exited", top_p = tune()) %>%  
  prep()


# Bake it
train_baked <- recipe %>%  bake(train_tbl)
test_baked <- recipe %>% bake(test_tbl)

Solution

You can't prep() a recipe that has tuneable arguments. Think of prep() as an analogy for fit() for a model; you can't fit a model if you haven't set the hyperparameters.

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step

rec <- recipe( ~ ., data = USArrests) %>%
  step_normalize(all_numeric()) %>%
  step_pca(all_numeric(), num_comp = tune::tune())

prep(rec, training = USArrests)
#> Error in `prep()`:
#> ! You cannot `prep()` a tuneable recipe. Argument(s) with `tune()`: 'num_comp'. Do you want to use a tuning function such as `tune_grid()`?

^{Created on 2022-02-22 by the reprex package (v2.0.1)}