As you can see from my code, I am trying to include feature selection into my tidymodels workflow. I am using some kaggle data, trying to predict customer churn.
In order to apply processing to test and training data, I am baking the recipe after I am using the the prep() function.
However, if I want to apply tuning for the step_select_roc() functions top_p argument, I do not know, how to prep() the recipe afterwards. Applying it as in my reprex, results in an error.
Maybe I have to adapt my workflow and separate some recipe tasks to get the job done. What is the best approach to achieve this?
#### LIBS
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(themis))
suppressPackageStartupMessages(library(recipeselectors))
#### INPUT
# get dataset from: https://www.kaggle.com/shrutimechlearn/churn-modelling
data <- fread("Churn_Modelling.csv")
# split data
set.seed(seed = 1972)
train_test_split <-
rsample::initial_split(
data = data,
prop = 0.80
)
train_tbl <- train_test_split %>% training()
test_tbl <- train_test_split %>% testing()
#### FEATURE ENGINEERING
# Define the recipe
recipe <- recipe(Exited ~ ., data = train_tbl) %>%
step_rm(one_of("RowNumber", "Surname")) %>%
update_role(CustomerId, new_role = "Helper") %>%
step_num2factor(all_outcomes(),
levels = c("No", "Yes"),
transform = function(x) {x + 1}) %>%
step_normalize(all_numeric(), -has_role(match = "Helper")) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_corr(all_numeric(), -has_role("Helper")) %>%
step_nzv(all_predictors()) %>%
step_select_roc(all_predictors(), outcome = "Exited", top_p = tune()) %>%
prep()
# Bake it
train_baked <- recipe %>% bake(train_tbl)
test_baked <- recipe %>% bake(test_tbl)
You can't prep()
a recipe that has tuneable arguments. Think of prep()
as an analogy for fit()
for a model; you can't fit a model if you haven't set the hyperparameters.
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
rec <- recipe( ~ ., data = USArrests) %>%
step_normalize(all_numeric()) %>%
step_pca(all_numeric(), num_comp = tune::tune())
prep(rec, training = USArrests)
#> Error in `prep()`:
#> ! You cannot `prep()` a tuneable recipe. Argument(s) with `tune()`: 'num_comp'. Do you want to use a tuning function such as `tune_grid()`?
Created on 2022-02-22 by the reprex package (v2.0.1)