r parameters wolfram-mathematica logistic-regression coefficients

Difference between glm and LogitModelFit

I have a problem with glm function in R.

Specifically, I am not sure how to include nominal variables.

The results that I get in R after running the glm function are the following:

> df

   x1 x2 y
1  a  2  0
2  b  4  1
3  a  4  0
4  b  2  1
5  a  4  1
6  b  2  0

> str(df)
'data.frame':   6 obs. of  3 variables:
 $ x1: Factor w/ 2 levels "a","b": 1 2 1 2 1 2
 $ x2: num  2 4 4 2 4 2
 $ y: Factor w/ 2 levels "0","1": 1 2 1 2 2 1

Call:
glm(formula = y ~ x1 + x2, family = "binomial", data = df)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)   -39.132  15208.471  -0.003    0.998
x1b            19.566   7604.236   0.003    0.998
x2              9.783   3802.118   0.003    0.998

However, when I run the LogitModelFit function in Wolfram Mathematica I get different parameters.

The code in Wolfram is provided below:

data = {{a, 2, 0}, {b, 4, 1}, {a, 4, 0}, {b, 2, 1}, {a, 4, 1}, {b, 2, 0}};

model = LogitModelFit[data, {x, y}, {x, y}, NominalVariables -> x]

model["BestFitParameters"]

And these are my estimated parameters:

{-18.5661, -18.5661, 9.28303}

model // Normal

1/(1 + E^(18.5661 - 9.28303 y + 18.5661 DiscreteIndicator[x, a, {a, b}]))

So, what is different here? Why the results differ so much?

Am I doing something wrong in R or in Wolfram?

Solution

You have effectively 4 groups, for which you are trying to estimate 3 parameters:

library(dplyr)
df %>% group_by(x1, x2) %>% summarise(n = n(), y = mean(y))

As you can see from the huge standard errors, the parameter estimates aren't stable. The standard errors for wolfram should also be very large (if they are given).

Second, wolfram, seems to use a different reference group, for x1:

> df$x1 <- relevel(df$x1, "b")
> m <- glm(y ~ x1 + x2, family = binomial(), data = df, control = list(maxit = 100))
> summary(m)

Call:
glm(formula = y ~ x1 + x2, family = binomial(), data = df, control = list(maxit = 100))

Deviance Residuals: 
       1         2         3         4         5         6  
-0.00008   0.00008  -1.17741   1.17741   1.17741  -1.17741  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -19.566   7604.236  -0.003    0.998
x1a          -19.566   7604.236  -0.003    0.998
x2             9.783   3802.118   0.003    0.998

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 8.3178  on 5  degrees of freedom
Residual deviance: 5.5452  on 3  degrees of freedom
AIC: 11.545

Number of Fisher Scoring iterations: 18

This is much closer to the result of wolfram (this is actually the same model as you have found; I just choose another reference group).

The predictions for both models (glm and wolfram) will be practically equal. Actually any model with the first two parameters very small (best model will be -Inf) and the third parameter equal to half of the first two (9.783*2 = 19.566) will give almost the same result.

The factor two originates from the fact that x2 takes on value 2 and 4, which differ by two.