Search code examples
rmachine-learningtidymodels

What is the correct syntax for specifying probability estimates in probably::cal_plot_breaks?


I have a data frame of predicted class probabilities and true label values and I cannot specify the class probabilities in probably::cal_plot_breaks in a way that does not generate either an error or a warning. Is this a bug or am I doing something wrong?

Here is my reproducible code:

library(tidyverse)
library(probably)
#> 
#> Attaching package: 'probably'
#> The following objects are masked from 'package:base':
#> 
#>     as.factor, as.ordered

set.seed(100)

test_df <- tibble(
  probability_x = runif(100),
  probability_y = 1-probability_x,
  Label = sample(
    c("x", "y"), 100, replace = TRUE
  ) %>% as.factor()
)

produces_error <- test_df %>% 
  cal_plot_breaks(
    truth = Label,
    estimate = probability_x
  )
#> Error in `purrr::map()`:
#> ℹ In index: 2.
#> Caused by error in `estimate_str[[.x]]`:
#> ! subscript out of bounds
#> Backtrace:
#>      ▆
#>   1. ├─test_df %>% cal_plot_breaks(truth = Label, estimate = probability_x)
#>   2. ├─probably::cal_plot_breaks(., truth = Label, estimate = probability_x)
#>   3. ├─probably:::cal_plot_breaks.data.frame(., truth = Label, estimate = probability_x)
#>   4. │ └─probably:::cal_plot_breaks_impl(...)
#>   5. │   ├─probably::.cal_table_breaks(...)
#>   6. │   └─probably:::.cal_table_breaks.data.frame(...)
#>   7. │     └─probably:::.cal_table_breaks_impl(...)
#>   8. │       └─probably:::truth_estimate_map(...)
#>   9. │         └─purrr::map(seq_along(truth_levels), ~sym(estimate_str[[.x]]))
#>  10. │           └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
#>  11. │             ├─purrr:::with_indexed_errors(...)
#>  12. │             │ └─base::withCallingHandlers(...)
#>  13. │             ├─purrr:::call_with_cleanup(...)
#>  14. │             └─probably (local) .f(.x[[i]], ...)
#>  15. │               └─rlang::sym(estimate_str[[.x]])
#>  16. │                 └─rlang::is_symbol(x)
#>  17. └─purrr (local) `<fn>`(`<sbscOOBE>`)
#>  18.   └─cli::cli_abort(...)
#>  19.     └─rlang::abort(...)

produces_warning <- test_df %>% 
  cal_plot_breaks(
    truth = Label,
    estimate = starts_with("probability")
  )
#> Warning: Multiple class columns identified. Using: `probability_x`

<sup>Created on 2023-06-23 with reprex v2.0.2</sup>


Solution

  • This works for me:

    library(tidyverse)
    library(probably)
    #> 
    #> Attaching package: 'probably'
    #> The following objects are masked from 'package:base':
    #> 
    #>     as.factor, as.ordered
    
    set.seed(100)
    test_df <- tibble(
      .pred_x = runif(100),
      .pred_y = 1 - .pred_x,
      Label = as.factor(case_when(.pred_x > 0.5 ~ "x", TRUE ~ "y"))
    )
    
    cal_plot_breaks(test_df, Label, .pred_x)
    

    Created on 2023-07-06 with reprex v2.0.2

    But there is some kind of bug, I think, because if we change the name to probability_x, it won't work. I've opened an issue here.