Search code examples

Difference in output between predict.rpart and predict.glm

I split a dataset up in a training and test sample. I then fit a logit model on the training data to predict the outcome of the test sample. I can do this in two ways:

Using Tidyverse:

logit_mod <- logistic_reg() %>% 
 set_mode("classification") %>% 
 set_engine("glm") %>%
 fit(y ~ x + z, data=train)
res <- predict(logit_mod, new_data = test, type="prob")

Or with the GLM class:

logit_mod <- glm(y ~ x + z, data=train, family='logit')
res <- predict(logit_mod, newdata=test, type="response")

Both methods give me different output (probabilities of y). While the model should be the same. extracting logit_mod[["fit"]] gives me the same coefficients as I have for logit_mod using GLM.

Why does the second method give me different predicted probabilities?


  • If you do predict on a glm binomial regression, you get the probability of the positive class, and the probabilities from tidymodels are rounded up.

    For example, a simple regression with response as 0/1, 1 being positive class :

    df = data.frame(y = factor(rbinom(50,1,0.5)),x=runif(50),z=runif(50))
    train = df[1:40,]
    test = df[41:50,]
    logit_mod <- logistic_reg() %>% 
     set_mode("classification") %>% 
     set_engine("glm") %>%
     fit(y ~ x + z, data=train)
    res <- predict(logit_mod, new_data = test, type="prob")

    This is the prediction for class 1 :

           41        42        43        44        45        46        47        48 
    0.3186626 0.3931925 0.4259043 0.3651420 0.6670263 0.6732433 0.5844562 0.5584770 
           49        50 
    0.6791727 0.7567285

    Do glm and you can see its exactly the same:

    fit <- glm(y ~ x + z, data=train, family=binomial)
    res2 <- predict(fit, newdata=test, type="response")
           41        42        43        44        45        46        47        48 
    0.3186626 0.3931925 0.4259043 0.3651420 0.6670263 0.6732433 0.5844562 0.5584770 
           49        50 
    0.6791727 0.7567285