Search code examples
rlogistic-regression

Different coefficient in LRM vs GLM output


Let me first note that I haven't been able to reproduce this error on anything outside of my data set. However, here is the general idea. I have a data frame and I'm trying to build a simple logistic regression to understand the marginal effect of Amount on IsWon. Both models perform poorly, it's one predictor after all, but they produce two different coefficients

First is the glm output:

> summary(mod4)

Call:
glm(formula = as.factor(IsWon) ~ Amount, family = "binomial", 
    data = final_data_obj_samp)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2578  -1.2361   1.0993   1.1066   3.7307  

Coefficients:
                  Estimate     Std. Error z value              Pr(>|z|)    
(Intercept)  0.18708622416  0.03142171761  5.9540        0.000000002616 ***
Amount      -0.00000315465  0.00000035466 -8.8947 < 0.00000000000000022 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6928.69  on 4999  degrees of freedom
Residual deviance: 6790.87  on 4998  degrees of freedom
AIC: 6794.87

Number of Fisher Scoring iterations: 6

Notice that negative coefficient for Amount.

And now the lrm function from rms

Logistic Regression Model

lrm(formula = as.factor(IsWon) ~ Amount, data = final_data_obj_samp, 
    x = TRUE, y = TRUE)
                       Model Likelihood     Discrimination    Rank Discrim.    
                          Ratio Test            Indexes          Indexes       
Obs           5000    LR chi2     137.82    R2       0.036    C       0.633    
 0            2441    d.f.             1    g        0.300    Dxy     0.266    
 1            2559    Pr(> chi2) <0.0001    gr       1.350    gamma   0.288    
max |deriv| 0.0007                          gp       0.054    tau-a   0.133    
                                            Brier    0.242                     

          Coef   S.E.   Wald Z Pr(>|Z|)
Intercept 0.1871 0.0314  5.95  <0.0001 
Amount    0.0000 0.0000 -8.89  <0.0001 

Both models do a poor job, but one estimates a positive coefficient and the other a negative coefficient. Sure, the values are negligible, but can someone help me understand this.

For what it's worth, here's what the plot of the lrm object looks like.

> plot(Predict(mod2, fun=plogis))

enter image description here

The plot shows the predicted probabilities of winning have a very negative relationship with Amount.


Solution

  • You should not rely on the printed result from summary to check for coefficients. The summary table is controlled by print, hence will always subject to rounding problem. Have you tried mod4$coef (get coefficients of glm model mod4) and mod2$coef (get coefficients of lrm model mod2)? It is good idea to read the "values" section of ?glm and ?lrm.