Search code examples
rpurrrtidymodelsrsample

Compute Gini Index on a nested/rsplit object


I used rsample::bootstraps function to create a nested object just as follows :

Sampled_Data=bootstraps(credit_data,times = 2,strata="Home",apparent = TRUE)

What I get is as follows :

  splits                id        
  <list>                <chr>     
1 <split [34338/12635]> Bootstrap1
2 <split [34338/12592]> Bootstrap2
3 <split [34338/34338]> Apparent  

I would like to compute the Gini Index based on Columns "Status" and "Expenses" for all the bootstrapped dataframes just like this :

library(pROC)
2*auc(credit_data$Status,credit_data$Expenses)-1

The problem is that i don't know how to do it without unnesting and doing a for loop.

It seems that purr package should be interesting to be used here but I'm not familiar with this.

What I would like to have :

  splits                id            Gini
  <list>                <chr>     
1 <split [34338/12635]> Bootstrap1    x
2 <split [34338/12592]> Bootstrap2    y
3 <split [34338/34338]> Apparent      z

Any help ?

Thanks


Solution

  • I'll assume that you want to bootstrap this to get confidence intervals.

    You would use apparent = TRUE for some types of intervals, so I'll omit that here.

    library(tidymodels)
    tidymodels_prefer()
    
    data("credit_data")
    
    # See ?int_pctl and
    # https://www.tidymodels.org/learn/statistics/bootstrap
    # for more info. 
    get_gini <- function(split) {
      dat <- analysis(split)
      roc_res <- roc_auc(dat, truth = Status, Expenses)
      # Convert to gini stat
      roc_res %>% 
        mutate(
          .metric = "gini",
          .estimate = 2 * .estimate - 1
        ) %>% 
        # now use same fomrat as `tidy()`
        select(estimate = .estimate, term = .metric)
    }
    
    set.seed(1)
    # Set times higher for bootstrap intervals
    bts <- 
      bootstraps(credit_data, times = 50) %>% 
      mutate(gini = map(splits, get_gini))
    
    int_pctl(bts, gini)
    #> Warning: Recommend at least 1000 non-missing bootstrap resamples for term
    #> `gini`.
    #> # A tibble: 1 × 6
    #>   term   .lower .estimate .upper .alpha .method   
    #>   <chr>   <dbl>     <dbl>  <dbl>  <dbl> <chr>     
    #> 1 gini  -0.0463  -0.00173 0.0377   0.05 percentile
    

    Created on 2023-07-17 with reprex v2.0.2