I have a data frame of predicted class probabilities and true label values and I cannot specify the class probabilities in probably::cal_plot_breaks
in a way that does not generate either an error or a warning. Is this a bug or am I doing something wrong?
Here is my reproducible code:
library(tidyverse)
library(probably)
#>
#> Attaching package: 'probably'
#> The following objects are masked from 'package:base':
#>
#> as.factor, as.ordered
set.seed(100)
test_df <- tibble(
probability_x = runif(100),
probability_y = 1-probability_x,
Label = sample(
c("x", "y"), 100, replace = TRUE
) %>% as.factor()
)
produces_error <- test_df %>%
cal_plot_breaks(
truth = Label,
estimate = probability_x
)
#> Error in `purrr::map()`:
#> ℹ In index: 2.
#> Caused by error in `estimate_str[[.x]]`:
#> ! subscript out of bounds
#> Backtrace:
#> ▆
#> 1. ├─test_df %>% cal_plot_breaks(truth = Label, estimate = probability_x)
#> 2. ├─probably::cal_plot_breaks(., truth = Label, estimate = probability_x)
#> 3. ├─probably:::cal_plot_breaks.data.frame(., truth = Label, estimate = probability_x)
#> 4. │ └─probably:::cal_plot_breaks_impl(...)
#> 5. │ ├─probably::.cal_table_breaks(...)
#> 6. │ └─probably:::.cal_table_breaks.data.frame(...)
#> 7. │ └─probably:::.cal_table_breaks_impl(...)
#> 8. │ └─probably:::truth_estimate_map(...)
#> 9. │ └─purrr::map(seq_along(truth_levels), ~sym(estimate_str[[.x]]))
#> 10. │ └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
#> 11. │ ├─purrr:::with_indexed_errors(...)
#> 12. │ │ └─base::withCallingHandlers(...)
#> 13. │ ├─purrr:::call_with_cleanup(...)
#> 14. │ └─probably (local) .f(.x[[i]], ...)
#> 15. │ └─rlang::sym(estimate_str[[.x]])
#> 16. │ └─rlang::is_symbol(x)
#> 17. └─purrr (local) `<fn>`(`<sbscOOBE>`)
#> 18. └─cli::cli_abort(...)
#> 19. └─rlang::abort(...)
produces_warning <- test_df %>%
cal_plot_breaks(
truth = Label,
estimate = starts_with("probability")
)
#> Warning: Multiple class columns identified. Using: `probability_x`
<sup>Created on 2023-06-23 with reprex v2.0.2</sup>
This works for me:
library(tidyverse)
library(probably)
#>
#> Attaching package: 'probably'
#> The following objects are masked from 'package:base':
#>
#> as.factor, as.ordered
set.seed(100)
test_df <- tibble(
.pred_x = runif(100),
.pred_y = 1 - .pred_x,
Label = as.factor(case_when(.pred_x > 0.5 ~ "x", TRUE ~ "y"))
)
cal_plot_breaks(test_df, Label, .pred_x)
Created on 2023-07-06 with reprex v2.0.2
But there is some kind of bug, I think, because if we change the name to probability_x
, it won't work. I've opened an issue here.