Why in this code, coefficients (intercept and x) are different between the logistic seaborn regplot visualization and the statsmodel logit() analysis? Shouldn't the two lines start at the same intercept at least? What I'm I doing wrong?
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.formula.api import logit
np.random.seed(2022) # to get the same data each time
df = pd.DataFrame({
'y': np.random.randint(2, size=10),
'x': np.random.rand(10)
})
mdl = logit("y ~ x", data=df).fit()
print(mdl.summary())
sns.regplot(y='y', x='x', data=df, logistic=True, ci=None)
plt.axline(xy1=(0, mdl.params[0]), slope=mdl.params[1], color='black')
plt.show()
Optimization terminated successfully.
Current function value: 0.665054
Iterations 5
Logit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 10
Model: Logit Df Residuals: 8
Method: MLE Df Model: 1
Date: Tue, 26 Jul 2022 Pseudo R-squ.: 0.04053
Time: 07:43:10 Log-Likelihood: -6.6505
converged: True LL-Null: -6.9315
Covariance Type: nonrobust LLR p-value: 0.4535
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 2.0253 2.902 0.698 0.485 -3.663 7.713
x -2.7006 3.741 -0.722 0.470 -10.033 4.632
==============================================================================
What you are seeing in the sns.regplot()
plot is the plot of probabilities, not of logits (i.e. linear regression line with the estimated intercept and slope). So to match that plot using the results of your logit
model, you have to compute a probability value for each x
value using the intercept and slope.
Probabilities are computed by first computing logits (linear combinations of your estimated intercept and slope and the x
values):
logits = mdl.params[0] + mdl.params[1] * df['x']
and then passing them through the sigmoid function:
probs = np.exp(logits) / (1 + np.exp(logits))
Here is the full code and plot of both lines:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.formula.api import logit
np.random.seed(2022) # to get the same data each time
df = pd.DataFrame({
'y': np.random.randint(2, size=10),
'x': np.random.rand(10)
})
mdl = logit("y ~ x", data=df).fit()
print(mdl.summary())
logits = mdl.params[0] + mdl.params[1] * df['x']
probs = np.exp(logits) / (1 + np.exp(logits))
sns.regplot(y='y', x='x', data=df, logistic=True, ci=None)
plt.plot(df['x'], probs, color='red')
plt.show()