Search code examples
rtidymodels

In the r tidymodels ecosystem is there a collect_metrics() equivalent for a simple fit model that does not use resampling?


In the tidymodels ecosystem, is there an equivalent to collect_metrics() that will evaluate model performance on a training dataset without using resampling?

Why?

The collect_metrics() function is a lovely way to extract model performance metrics with resampling. I am teaching and I would love to apply collect_metrics() to simple fit() models to make the point about how overly optimistic results are when you evaluate on your training data.

Showing the process of fitting the model to the training data and calling model evaluation functions (e.g., accuracy() and roc_auc() etc. for a logistic model) is a useful but very distracting tangent that I am trying to avoid. I am thinking I could build a function that would call the "default" collect_metrics() metrics on a "model_fit" object but I am hoping somebody beat me too it.


Solution

  • You can do it in two lines via augment() and a metric set:

    library(tidymodels)
    
    
    tidymodels_prefer()
    theme_set(theme_bw())
    options(pillar.advice = FALSE, pillar.min_title_chars = Inf)
    
    
    data("two_class_dat")
    
    mod_fit <- 
      logistic_reg() %>% 
      fit(Class ~ ., data = two_class_dat)
    
    # Make your own metric set
    some_metrics <- metric_set(accuracy, roc_auc)
    
    # Get predictions on the training set
    augment(mod_fit, new_data = two_class_dat) %>% 
      # Evaluate the metric set
      some_metrics(Class, .pred_Class1, estimate = .pred_class)
    #> # A tibble: 2 × 3
    #>   .metric  .estimator .estimate
    #>   <chr>    <chr>          <dbl>
    #> 1 accuracy binary         0.819
    #> 2 roc_auc  binary         0.888
    

    Created on 2022-11-29 by the reprex package (v2.0.1)