logistic_reg() estimates in tidymodels interpretation seem to be predicting incorrect class as positive outcome

I am using tidymodels to develop a logistic regression using the Palmer's Penguins dataset.

I believe that female should be the reference class since it is the first level of my outcome variable, and therefore a positive outcome for the model should female.

When I've built the model, the coefficient estimates seem to indicate that an increase in bill_depth_mm and bill_length_mm result in an increase in the likelihood of being a female, but when plotted the reverse relationship appears to be true.

So this would lead me to believe that male is the predicted outcome for the model, however when it comes to calculating specificity and sensitivity it treats the female outcome as the true positive outcome. Similarly, an ROC plot needs the reference level to be the predicted likelihood of a female outcome.

Why does it appear as though the estimates from the model output are reversed? Is there an option in tidymodels to specify what the estimates should relate to? Or even confirm what the estimates are related to?

Or is my understanding of the model incorrect and these estimates are correct - an increase in bill length and depth does increase the likelihood of the penguin being female?

penguins <- read_csv("Data/penguins.csv")

penguins <- penguins %>% 
  mutate(sex = factor(sex))

set.seed(57475)
penguins_splits <- initial_split(penguins,
                                 strata = sex)

penguins_training <- training(penguins_splits) 
penguins_testing <- testing(penguins_splits)

penguins_recipe <- recipe(sex ~ ., data = penguins_training) %>% 
  step_naomit(all_predictors()) %>%
  step_rm(rowid, skip = TRUE) %>%
  step_string2factor(all_nominal_predictors()) %>%
  step_mutate(year = factor(year)) %>%
  step_dummy(all_nominal_predictors())


logistic_model <- logistic_reg() %>% 
  set_engine("glm") %>%
  set_mode("classification")

logistic_wf <- workflow() %>% 
  add_model(logistic_model) %>% 
  add_recipe(penguins_recipe)

logistic_fit <- logistic_wf %>%
  fit(data = penguins_training)

logistic_fit %>%
  pull_workflow_fit %>%
  tidy()

ggplot(penguins_training, aes(x = bill_length_mm, colour = sex)) +
  geom_boxplot()

Output from the model object

> logistic_fit %>%
+   pull_workflow_fit %>%
+   tidy()
# A tibble: 11 x 5
   term               estimate std.error statistic      p.value
   <chr>                 <dbl>     <dbl>     <dbl>        <dbl>
 1 (Intercept)       -76.7      13.7        -5.58  0.0000000237
 2 bill_length_mm      0.831     0.176       4.72  0.00000238  
 3 bill_depth_mm       1.53      0.391       3.91  0.0000913   
 4 flipper_length_mm  -0.0290    0.0611     -0.474 0.635       
 5 body_mass_g         0.00593   0.00132     4.49  0.00000724  
 6 species_Chinstrap  -9.17      2.06       -4.45  0.00000843  
 7 species_Gentoo     -8.82      3.22       -2.74  0.00608     
 8 island_Dream        0.874     0.968       0.903 0.367       
 9 island_Torgersen   -0.237     0.974      -0.243 0.808       
10 year_X2008         -0.239     0.701      -0.341 0.733       
11 year_X2009         -0.762     0.736      -1.04  0.300

Solution

You're correct in our interpretations - this is due to a clash in conventions between the underlying engine (here stats::glm()) and yardstick. The model uses the first level as "failure" while yardstick uses the first level as "success". To alter the behavior of yardstick, set event_level = "second". (To alter the behavior of glm(), recode the factor.)

https://yardstick.tidymodels.org/reference/sens.html#relevant-level