machine learning model based on tidymodels package seems calculate metrics incorrectly

I have simulated data as follows:

require(tidymodels)

set.seed(123)
x1=rnorm(1000,0,1)
x2=rnorm(1000,0.5,1)
x3=rnorm(1000,2.5,1)
x4=rnorm(1000,2.5,1.5)

y=exp(0.3*x1 + 0.25*x2 + 0.75)/(1+exp(0.3*x1 + 0.25*x2 + 0.75))
y_bin=ifelse(y>0.8,1,0)


data_sim=data.frame(x1,x2=2.5*x2,x3=3*x3, x4,y_bin=factor(y_bin))

Based on this data, I fitted following lasso logistic regression model using 5-fold cross validation

set.seed(123)
fold_data=vfold_cv(data_sim,v=5)

lasso_spec<- logistic_reg( penalty = tune(),
                           mixture = 1) %>% set_engine("glmnet")

lamd_grid <- grid_regular(penalty(), levels =50)

res_basic <- recipe(y_bin ~ x1 +x2+x3 + x4  , data=data_sim)

wf <- workflow() %>% add_model(lasso_spec) %>% add_recipe(res_basic)

lasso_tune <- wf %>%
              tune_grid(
                resamples=fold_data,
                grid=lamd_grid,
                control=control_grid(verbose = FALSE, save_pred = T),
                metrics=metric_set(roc_auc, sensitivity, precision, recall)
                
              )

The best model based on precision metric as following values:

lasso_tune %>% show_best("precision")
# A tibble: 5 × 7
   penalty .metric   .estimator  mean     n std_err .config              
     <dbl> <chr>     <chr>      <dbl> <int>   <dbl> <chr>                
1 1   e-10 precision binary     0.997     5 0.00138 Preprocessor1_Model01
2 1.60e-10 precision binary     0.997     5 0.00138 Preprocessor1_Model02
3 2.56e-10 precision binary     0.997     5 0.00138 Preprocessor1_Model03
4 4.09e-10 precision binary     0.997     5 0.00138 Preprocessor1_Model04
5 6.55e-10 precision binary     0.997     5 0.00138 Preprocessor1_Model05

At the mean time, I extracted the predictions from each fold as follows:

fod1=lasso_tune$.predictions[[1]] %>% filter(penalty %in% p1$penalty) %>% 
      select(.pred_class, y_bin)

fod2=lasso_tune$.predictions[[2]] %>% filter(penalty %in% p1$penalty) %>% 
  select(.pred_class, y_bin)

fod3=lasso_tune$.predictions[[3]] %>% filter(penalty %in% p1$penalty) %>% 
  select(.pred_class, y_bin)

fod4=lasso_tune$.predictions[[4]] %>% filter(penalty %in% p1$penalty) %>% 
  select(.pred_class, y_bin)

fod5=lasso_tune$.predictions[[5]] %>% filter(penalty %in% p1$penalty) %>% 
  select(.pred_class, y_bin)

fold_cmb=data.frame(rbind(fod1,fod2,fod3,fod4,fod5))

The cross tabulation based on combine results looks like this:

> table(fold_cmb$.pred_class,fold_cmb$y_bin)
   
      0   1
  0 885   3
  1   1 111

Based on this precision should be equal to 0.991, not 0.997. I guess the package calculate the precision for wrong class label. You can see that for y_bin=0, 885/888=0.997.

So, Can anyone help me to figure out how to change the code so that it will calculate the metrics correctly? Thank you

Solution

You are getting that results because precision(), and all the other {yardstick} metrics uses the first level as the "event". Thus 0 is considered the event for your data due to the construction of your data.

To get the results you want you can:

Set levels at the beginning

Using the levels argument in factor you can specify which level be the first and thus be considered the "event" by {yardstick}.

Doing this

factor(y_bin, levels = c(1, 0))

instead of this

factor(y_bin)

Use the `event_level` argument

The metric functions that are affected by level ordering have an argument event_level that you can use to specify how the calculations should be done

library(yardstick)

# event_level defaults to "first"
two_class_example |>
  precision(truth = truth, estimate = predicted)
#> # A tibble: 1 × 3
#>   .metric   .estimator .estimate
#>   <chr>     <chr>          <dbl>
#> 1 precision binary         0.819

two_class_example |>
  precision(truth = truth, estimate = predicted, event_level = "second")
#> # A tibble: 1 × 3
#>   .metric   .estimator .estimate
#>   <chr>     <chr>          <dbl>
#> 1 precision binary         0.861

The control_grid() function also has this argument so you can get the results when running {tune} functions

Do this

control_grid(verbose = FALSE, save_pred = T, event_level = "second")

machine learning model based on tidymodels package seems calculate metrics incorrectly

Set levels at the beginning

Use the event_level argument

Use the `event_level` argument