Search code examples
rregressionprobabilitymultinomialnnet

Probability results from Multinomial Regression nnet package


Good afternoon, I have a problem with the output I get when performing a logistic regression with NNET package. I want to predict Category with HS_TR (Return Period) and SLR (Sea Level Rise). The multinomial model, called fit, has been calculated with the information from the x.sub subset. There are 4 different categories possible 1,2,3 or 4.

x.sub:

   POINTID  HS_TR  SLR  Category
       4     10    0.0     3
       4     10    0.6     4
       4     50    0.0     3
       4     50    0.6     4
       4    100    0.0     4
       4    100    0.6     4

When I run the model> fit <- multinom(Category ~ HS_TR + SLR, x.sub, maxit=3000) I get the results :

Coefficients:
    (Intercept)       HS_TR         SLR 
    -30.5791517   0.4130478  62.0976951 

    Residual Deviance: 0.0001820405 
    AIC: 6.000182

Now that I have the multinomial, I want to know the predicted category for a specific scenario (d3) of SLR and HS_TR. I define d3 and apply the prediction and I get reasonable result:

d3<-data.frame("HS_TR"=c(10),"SLR"=c(0))
prediction <-(predict(fit,d3))

I get

> prediction
[[1]]
[1] 3 
Level: 3

However, when I calculate the probability of getting the prediction prediction <-(predict(fit,d3, type="probs")), I get the following:

> prediction
[[1]]
1 
0 

Which makes no sense since it says that there is probability 0. Since the model I run gives a prediction of the CATEGORY, I don't understand why then, the probability is 0. Does someone know why I get it?

If someone knows how I could work on the problem so that I can solve it. Thank you in advance.


Solution

  • You have a problem with separation / complete separation (Google the term to get more information. This page gives a nice introduction which contains this quote:

    A complete separation happens when the outcome variable separates a predictor variable or a combination of predictor variables completely.

    If you look at your data, for example using

    > xtabs(~ Category + HS_TR + SLR, data=x.sub)
    , , SLR = 0
    
            HS_TR
    Category 10 50 100
           3  1  1   0
           4  0  0   1
    
    , , SLR = 0.6
    
            HS_TR
    Category 10 50 100
           3  0  0   0
           4  1  1   1
    

    then you see that the combination of SLR and HS_TR fully determines the outcome for SLR=0.6. You need to specify a simpler model or get more data to provide a stable fit.

    In your case your output has only two possible categories so you should be able to fit a log linear model or logistic regression model and get the same result. If you create a new variable Cat that is a factor of Category then you see a warning pointing you in the right direction.

    > glm(Cat ~HS_TR + SLR, data=x.sub, family="binomial") 
    Warning message:
    glm.fit: fitted probabilities numerically 0 or 1 occurred 
    

    I think multinom does not detect the problem in the data. However, if you look at the summary of your fit then you find that the standard error of two of the parameter estimates are extremely large. This also suggests that the estimates are not stable and that separation could be a problem.

    > summary(fit)
    Call:
    multinom(formula = Category ~ HS_TR + SLR, data = x.sub, maxit = 3000)
    
    Coefficients:
                     Values  Std. Err.
    (Intercept) -30.5791517 356.932851
    HS_TR         0.4130478   5.137396
    SLR          62.0976951 634.584184
    
    Residual Deviance: 0.0001820405 
    AIC: 6.000182 
    

    I think the convergence checking in multinom lacks a check of some sort.