Good afternoon, I have a problem with the output I get when performing a logistic regression with NNET package. I want to predict Category
with HS_TR (Return Period)
and SLR (Sea Level Rise)
.
The multinomial model, called fit
, has been calculated with the information from the x.sub
subset. There are 4 different categories possible 1,2,3 or 4.
x.sub
:
POINTID HS_TR SLR Category
4 10 0.0 3
4 10 0.6 4
4 50 0.0 3
4 50 0.6 4
4 100 0.0 4
4 100 0.6 4
When I run the model> fit <- multinom(Category ~ HS_TR + SLR, x.sub, maxit=3000)
I get the results :
Coefficients:
(Intercept) HS_TR SLR
-30.5791517 0.4130478 62.0976951
Residual Deviance: 0.0001820405
AIC: 6.000182
Now that I have the multinomial, I want to know the predicted category for a specific scenario (d3) of SLR and HS_TR. I define d3 and apply the prediction and I get reasonable result:
d3<-data.frame("HS_TR"=c(10),"SLR"=c(0))
prediction <-(predict(fit,d3))
I get
> prediction
[[1]]
[1] 3
Level: 3
However, when I calculate the probability of getting the prediction prediction <-(predict(fit,d3, type="probs"))
, I get the following:
> prediction
[[1]]
1
0
Which makes no sense since it says that there is probability 0. Since the model I run gives a prediction of the CATEGORY
, I don't understand why then, the probability is 0. Does someone know why I get it?
If someone knows how I could work on the problem so that I can solve it. Thank you in advance.
You have a problem with separation / complete separation (Google the term to get more information. This page gives a nice introduction which contains this quote:
A complete separation happens when the outcome variable separates a predictor variable or a combination of predictor variables completely.
If you look at your data, for example using
> xtabs(~ Category + HS_TR + SLR, data=x.sub)
, , SLR = 0
HS_TR
Category 10 50 100
3 1 1 0
4 0 0 1
, , SLR = 0.6
HS_TR
Category 10 50 100
3 0 0 0
4 1 1 1
then you see that the combination of SLR
and HS_TR
fully determines the outcome for SLR=0.6
. You need to specify a simpler model or get more data to provide a stable fit.
In your case your output has only two possible categories so you should be able to fit a log linear model or logistic regression model and get the same result. If you create a new variable Cat
that is a factor of Category
then you see a warning pointing you in the right direction.
> glm(Cat ~HS_TR + SLR, data=x.sub, family="binomial")
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
I think multinom
does not detect the problem in the data. However, if you look at the summary
of your fit then you find that the standard error of two of the parameter estimates are extremely large. This also suggests that the estimates are not stable and that separation could be a problem.
> summary(fit)
Call:
multinom(formula = Category ~ HS_TR + SLR, data = x.sub, maxit = 3000)
Coefficients:
Values Std. Err.
(Intercept) -30.5791517 356.932851
HS_TR 0.4130478 5.137396
SLR 62.0976951 634.584184
Residual Deviance: 0.0001820405
AIC: 6.000182
I think the convergence checking in multinom
lacks a check of some sort.