Search code examples
rdplyrpredictforcats

predict.glm on a valid model returning NULL


I want to use predict.glm to return predictions using the same dataset used to train the original model, but I keep getting NULL as my result. I have a valid model, with no rows deleted due to missing values.

My code has many variables and the project is a bit sensitive in nature, so I try to reproduce my issue using a toy example. However, as I am unsure of what is causing my problem, I have been unable to reproduce any NULL outputs using glm.predict(object, type = "response). It is my hope that someone with prior experience with this problem will be able to recommend solutions.

library(MASS)
library(tidyverse)


mod1 <- glm(status ~ 
              state + sex + diag + death + T.categ + age,
            family = "binomial", data = Aids2)
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

#Below is caused because `death` has values that yield a status of "D" 100% of #time

head(predict.glm(mod1, type = "response"))
#> 1 2 3 4 5 6 
#> 1 1 1 1 1 1

#removing `death` as predictor

mod2 <- glm(status ~ 
              state + sex + diag + T.categ + age,
            family = "binomial", data = Aids2)
head(predict.glm(mod2, type = "response"))
#>         1         2         3         4         5         6 
#> 0.4690554 0.4758433 0.9820719 0.9884703 0.9292926 0.9333818

I am unsure what conditions would cause the above calls to produce NULL as the result for predict.glm as I have specified it. The results in the code are what I wish to get, but in my actual project, I get NULL even though it has returned proper values for me in the past. I realize this isn't a great reproducible example, but I cannot provide details about my actual data. I appreciate any assistance.


Solution

  • SOLUTION: In my original problem, not the toy example above, I was wrapping glm() with summary(). The solution was to ensure that my object argument to predict.glm was the general linear model itself, not the summary. I had been careless and assumed that the summary of glm would be an equivalent class to glm itself.

    #same as mod1, but wrapping in summary()
    
    mod3 <- summary(glm(status ~ 
                          state + sex + diag + death + T.categ + age,
                        family = "binomial", data = MASS::Aids2))
    #> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
    
    head(predict.glm(mod3, type = "response"))
    #> NULL
    
    mod4 <- summary(glm(status ~ 
                          state + sex + diag + T.categ + age,
                        family = "binomial", data = MASS::Aids2))
    
    head(predict.glm(mod4, type = "response"))
    #> NULL
    

    I appreciate those who took the time and tried to troubleshoot my question.