Getting more information about C5 model in tidymodels

Here's a simple modelling workflow using the palmerpenguins dataset:

library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
#> Warning: package 'parsnip' was built under R version 4.1.3
library(rules)
#> 
#> Attaching package: 'rules'
#> The following object is masked from 'package:dials':
#> 
#>     max_rules
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.1.3


set.seed(2022)
penguins_split <- initial_split(penguins)
penguins_training <- training(penguins_split)
penguins_testing <- testing(penguins_split)

folds <- vfold_cv(penguins_training, v = 3)

simple_rec <- penguins_training %>% 
  recipe(species ~ .)

C5_model <- C5_rules() %>% 
  set_engine("C5.0")

penguins_wf <- workflow() %>% 
  add_recipe(simple_rec) %>% 
  add_model(C5_model)

penguins_no_tuning <- fit_resamples(
  object = penguins_wf,
  resamples = folds,
  control = control_resamples(save_pred = TRUE, verbose = TRUE)
)
#> Warning: package 'C50' was built under R version 4.1.3
#> i Fold1: preprocessor 1/1
#> v Fold1: preprocessor 1/1
#> i Fold1: preprocessor 1/1, model 1/1
#> v Fold1: preprocessor 1/1, model 1/1
#> i Fold1: preprocessor 1/1, model 1/1 (predictions)
#> i Fold2: preprocessor 1/1
#> v Fold2: preprocessor 1/1
#> i Fold2: preprocessor 1/1, model 1/1
#> v Fold2: preprocessor 1/1, model 1/1
#> i Fold2: preprocessor 1/1, model 1/1 (predictions)
#> i Fold3: preprocessor 1/1
#> v Fold3: preprocessor 1/1
#> i Fold3: preprocessor 1/1, model 1/1
#> v Fold3: preprocessor 1/1, model 1/1
#> i Fold3: preprocessor 1/1, model 1/1 (predictions)

collect_metrics(penguins_no_tuning)
#> # A tibble: 2 x 6
#>   .metric  .estimator  mean     n std_err .config             
#>   <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
#> 1 accuracy multiclass 0.977     3 0.0134  Preprocessor1_Model1
#> 2 roc_auc  hand_till  0.985     3 0.00976 Preprocessor1_Model1

penguins_final_fit <- penguins_wf %>%
  last_fit(split = penguins_split)

collect_metrics(penguins_final_fit)
#> # A tibble: 2 x 4
#>   .metric  .estimator .estimate .config             
#>   <chr>    <chr>          <dbl> <chr>               
#> 1 accuracy multiclass     0.942 Preprocessor1_Model1
#> 2 roc_auc  hand_till      0.990 Preprocessor1_Model1

#Display rules
extract_fit_engine(penguins_final_fit) %>% 
  summary()
#> 
#> Call:
#> C5.0.default(x = x, y = y, trials = trials, rules = TRUE, control
#>  = C50::C5.0Control(minCases = minCases, seed = sample.int(10^5,
#>  1), earlyStopping = FALSE))
#> 
#> 
#> C5.0 [Release 2.07 GPL Edition]      Thu Mar 17 17:42:59 2022
#> -------------------------------
#> 
#> Class specified by attribute `outcome'
#> 
#> Read 258 cases (8 attributes) from undefined.data
#> 
#> Rules:
#> 
#> Rule 1: (73, lift 2.3)
#>  island in {Biscoe, Torgersen}
#>  flipper_length_mm <= 206
#>  ->  class Adelie  [0.987]
#> 
#> Rule 2: (98/18, lift 1.8)
#>  island in {Dream, Torgersen}
#>  bill_length_mm <= 46.5
#>  ->  class Adelie  [0.810]
#> 
#> Rule 3: (53, lift 4.7)
#>  island = Dream
#>  bill_length_mm > 42.2
#>  ->  class Chinstrap  [0.982]
#> 
#> Rule 4: (90, lift 2.8)
#>  island = Biscoe
#>  flipper_length_mm > 206
#>  ->  class Gentoo  [0.989]
#> 
#> Default class: Gentoo
#> 
#> 
#> Evaluation on training data (258 cases):
#> 
#>          Rules     
#>    ----------------
#>      No      Errors
#> 
#>       4    1( 0.4%)   <<
#> 
#> 
#>     (a)   (b)   (c)    <-classified as
#>    ----  ----  ----
#>     113                (a): class Adelie
#>       1    53          (b): class Chinstrap
#>                  91    (c): class Gentoo
#> 
#> 
#>  Attribute usage:
#> 
#>   99.61% island
#>   63.18% flipper_length_mm
#>   51.94% bill_length_mm
#> 
#> 
#> Time: 0.0 secs

#Model information?
extract_workflow(penguins_final_fit) 
#> == Workflow [trained] ==========================================================
#> Preprocessor: Recipe
#> Model: C5_rules()
#> 
#> -- Preprocessor ----------------------------------------------------------------
#> 0 Recipe Steps
#> 
#> -- Model -----------------------------------------------------------------------
#> C5.0 Model Specification ()

extract_workflow(penguins_final_fit) %>% 
  summary()
#>         Length Class      Mode   
#> pre     2      stage_pre  list   
#> fit     2      stage_fit  list   
#> post    1      stage_post list   
#> trained 1      -none-     logical

^{Created on 2022-03-17 by the reprex package (v2.0.1)}

I have three questions:

When displaying the rules, it says Evaluation on training data but penguins_final_fit was fitted on the test data using last_fit(). How do I get the model to output Evaluation on test data instead?
How do I get more information from the C5 model e.g. how deep did the tree go before pruning? extract_fit_engine() and extract_workflow() don't provide that information.
The auxiliary parameters for C5.0 listed here - can these be tuned and, if so, where should these arguments be added? I took a look at ?C50::C5.0Control and still didn't understand how to implement these in a tidymodels framework.

Solution

When you use last_fit() you fit to the training data and evaluate on the testing data. If you look at the output of last_fit(), the metrics and predictions are from the testing data, while the fitted workflow was trained using the training data. You can read more about using the test set.
You have surfaced a bug in how we handle tuning engine-specific arguments in parsnip extension packages. I know this is inconvenient for you, but thank you for the report!