r warnings logistic-regression intercept bayesglm

Logistic regression without an intercept gives fitting warning message

I am trying to run the logistic regression without an intercept. Firstly, I tried the function glm but I got the following error:

    Warning message:        
    glm.fit: fitted probabilities numerically 0 or 1 occurred

Since it was not possible to change the data set at all given the nature of my work, I decided to use a different R program package which had the code bayesglm.

When I use this function including the intercept, I get no error message as above. However, when I exclude the intercept by adding -1 at the end of my function I still get the same error above with the following output:

    > regress=bayesglm(y~x1*x2+x3+x4-1, data = DATA, family=binomial(link="logit"))     
    > summary(regress)      

    Call:       
    bayesglm(formula = y ~ x1 * x2 + x3 + x4 - 1, family = binomial(link = "logit"),        
        data = DATA, maxit = 10000)     

    Deviance Residuals:         
         Min        1Q    Median        3Q       Max        
    -1.01451  -0.43143  -0.22778  -0.05431   2.89066        

    Coefficients:       
             Estimate Std. Error z value Pr(>|z|)           
    x1      -20.45537    9.70594  -2.108  0.03507 *         
    x2       -7.04844    2.87415  -2.452  0.01419 *         
    x1:x2     0.13409   17.57010   0.008  0.99391           
    x3       -0.17779    0.06377  -2.788  0.00531 **        
    x4       -0.02593    0.05313  -0.488  0.62548           
    ---     
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1      

    (Dispersion parameter for binomial family taken to be 1)        

        Null deviance: 494.91  on 357  degrees of freedom       
    Residual deviance: 124.93  on 352  degrees of freedom       
      (165 observations deleted due to missingness)     
    AIC: 134.93     

    Number of Fisher Scoring iterations: 123

and get the same error as below:

    Warning message:        
    glm.fit: fitted probabilities numerically 0 or 1 occurred

which I do not get if I do not add -1 to remove the intercept.

Therefore, I have two questions to ask:

1. Is it possible for me to ignore this warning message?

2. Otherwise, may I know how I can fix the problem according to this warning message?

Solution

I will try to provide an answer to the question.

What does the warning mean? The warning is given when numerical precision might be in question for certain observations. More precisely, it is given in the case where the fitted model, returns probability of 1 - epsilon or equivalently 0 + epsilon. As standard this bound is 1-10^-8 and 10^-8 respectively (as given by glm.control) for the standard glm.fit function.

When may this happen? To my experience the case where this happen most often, is the case where factors (or dummy variables) are included, for which only one outcome is observed in one catagory. This happens most often when interactions are included in factors of many levels, and limited data for the analysis. Similarly if one has many variables compared to the number of observations (counting used variables, interactions transformations etc. as individual variables, so the total number will be the sum of all of these), a similar image will be possible. In your case, if you have factors, removing the intercept will adds 1 level to each factor, which might reduce precision near the probability edge case of 0 and 1. In short if for some part of our data, we have no (or little) uncertainty, then this warning will give us an indication.

Can i ignore it otherwise how can i fix it? This is dependent on the problem at hand, and the scale of the problem. Several sources, like John Fox, will likely consider these observations possible outliers, and with good arguments suggests removing these after using influence measures (availible in the car package for base glm) or performing some outlier tests (also availible in the car package for base glm), if this is an option within your field of work. If these shows them to not influence the fit, you would not remove them, as there would be no statistical argument for doing so.

If outlier removal is not an option in your field of work, then a reduced model (less variables in general) might help if this is the cause, or if the number of factors are the cause merging levels within factors might give some better results.

Other sources might have other suggestions, but John Fox is a credible source on the subject for these model types. It becomes a question of 'Is my model correctly specified?', 'How severely does it affect my model?' and 'How much are you allowed to do in your line of work?', while following the general theory and guidelines within statistics. Probabilities close to 0 and 1 are less likely to be precise and more likely to be due to numerical impression, but if these are not the cases that you are likely to predict, and there is no significant effect on the remainder of the model, this is not necessarily a problem and may be ignored.