I have the following problem. I would like to do an in-sample prediction using logit
from statsmodels.formula.api
.
See my code:
import statsmodels.formula.api as smf
model_logit = smf.logit(formula="dep ~ var1 + var2 + var3", data=model_data)
Until now everything's fine. But I would like to do in-sample prediction using my model:
yhat5 = model5_logit.predict(params=["dep", "var1", "var2", "var3"])
Which gives an error ValueError: data type must provide an itemsize
.
When I try:
yhat5 = model5_logit.predict(params="dep ~ var1 + var2 + var3")
I got another error: numpy.core._exceptions._UFuncNoLoopError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U69')) -> None
How can I do in-sample forecast for the Logit model using from statsmodels.formula.api
?
This did not help me: How to predict new values using statsmodels.formula.api (python)
Using an example dataset:
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
from sklearn.datasets import make_classification
X,y = make_classification(n_features=3,n_informative=2,n_redundant=1)
model_data = pd.DataFrame(X,columns = ['var1','var2','var3'])
model_data['dep'] = y
Fit the model (which I don't see in your code):
import statsmodels.formula.api as smf
model_logit = smf.logit(formula="dep ~ var1 + var2 + var3", data=model_data)
res = model_logit.fit()
You can get the in sample predictions (in probabilities) and the predicted label :
in_sample = pd.DataFrame({'prob':res.predict()})
in_sample['pred_label'] = (in_sample['prob']>0.5).astype(int)
in_sample.head()
prob pred_label
0 0.005401 0
1 0.911056 1
2 0.990406 1
3 0.412332 0
4 0.983642 1
And we check against the actual label :
pd.crosstab(in_sample['pred_label'],model_data['dep'])
dep 0 1
pred_label
0 46 2
1 4 48