I built a multinomial regression with scikit_learn
, it worked fine.
I then tried to use the same data with statsmodel
as it provides more insight, and it seems to skip the first y value. Any ideas on what I may have done wrong ?
I have 6 variables in_X
and 7 possible outcome in_y
(from y=1 to y=7), but statsmodel
returns only 6 coefficients.
When I print print(result.summary())
the log starts at y=2
Here's the data shape:
in_y.value_counts()
>>>
3 295
4 154
5 125
2 86
6 28
1 5
7 3
Name: y, dtype: int64
in_X.head()
>>>
ENTERPRISE_VALUE_ SALES_GROWTH_ EBIT_TO_INT_EXP_ NET_DEBT_TO_EBITDA_ RETURN_COM_EQY_ CASH_RATIO_
918 4.0 4.0 4.0 4.0 5.0 4.0
344 6.0 3.0 4.0 4.0 4.0 6.0
348 5.0 3.0 3.0 5.0 3.0 6.0
906 4.0 5.0 4.0 4.0 4.0 4.0
80 3.0 4.0 4.0 4.0 4.0 4.0
(696, 6)
The code:
import pandas as pd
import statsmodels.discrete.discrete_model as sm
logit_model = sm.MNLogit(in_y, in_X)
result = logit_model.fit()
# Results analysis
print(result.summary())
out1 = result.params
out1
0 1 2 3 4 5
ENTERPRISE_VALUE_ -0.228684 -1.274831 -2.546053 -3.440249 -3.602911 -3.822631
SALES_GROWTH_ 0.553498 0.706551 1.399920 1.675287 1.646694 1.152329
EBIT_TO_INT_EXP_ -0.036777 -0.304586 -0.895444 -1.351096 -1.614823 -0.593286
NET_DEBT_TO_EBITDA_ 0.772482 1.690700 2.106280 2.881484 3.524116 4.281756
RETURN_COM_EQY_ -0.053659 0.269994 0.487565 0.653377 0.228949 -1.413008
CASH_RATIO_ -0.035479 0.399930 0.808460 0.722607 0.263178 -0.502091
Result summary:
Logit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 696
Model: MNLogit Df Residuals: 660
Method: MLE Df Model: 30
Date: Mon, 01 Oct 2018 Pseudo R-squ.: 0.2390
Time: 12:09:15 Log-Likelihood: -769.38
converged: True LL-Null: -1011.0
LLR p-value: 3.400e-83
=======================================================================================
y=2 coef std err z P>|z| [0.025 0.975]
---------------------------------------------------------------------------------------
[...]
We need to drop one of the categories as reference category because of the restriction that probabilities have to add to 1. So given the other parameters the probability for the reference category is just the one minus the some of the non-reference probabilities.
This is the same as for the Logit model where we can estimate only one set of parameters, e.g. for the probability of success, the probability of the second binary choice, e.g. the probability to fail is just one minus the probability of success.
In both cases the prediction of the response variable will be a binary or multinomial probability that needs to satisfy restrictions for probabilities, i.e values between zero and one and adding to one.