how to predict using statsmodels.formula.api logit

I have the following problem. I would like to do an in-sample prediction using logit from statsmodels.formula.api.

See my code:

import statsmodels.formula.api as smf

model_logit = smf.logit(formula="dep ~ var1 + var2 + var3", data=model_data)

Until now everything's fine. But I would like to do in-sample prediction using my model:

yhat5 = model5_logit.predict(params=["dep", "var1", "var2", "var3"])

Which gives an error ValueError: data type must provide an itemsize.

When I try:

yhat5 = model5_logit.predict(params="dep ~ var1 + var2 + var3")

I got another error: numpy.core._exceptions._UFuncNoLoopError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U69')) -> None

How can I do in-sample forecast for the Logit model using from statsmodels.formula.api?

This did not help me: How to predict new values using statsmodels.formula.api (python)

Solution

Using an example dataset:

import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
from sklearn.datasets import make_classification

X,y = make_classification(n_features=3,n_informative=2,n_redundant=1)
model_data = pd.DataFrame(X,columns = ['var1','var2','var3'])
model_data['dep'] = y

Fit the model (which I don't see in your code):

import statsmodels.formula.api as smf
model_logit = smf.logit(formula="dep ~ var1 + var2 + var3", data=model_data)
res = model_logit.fit()

You can get the in sample predictions (in probabilities) and the predicted label :

in_sample = pd.DataFrame({'prob':res.predict()})
in_sample['pred_label'] = (in_sample['prob']>0.5).astype(int)

in_sample.head()
 
       prob  pred_label
0  0.005401           0
1  0.911056           1
2  0.990406           1
3  0.412332           0
4  0.983642           1

And we check against the actual label :

pd.crosstab(in_sample['pred_label'],model_data['dep'])
 
dep          0   1
pred_label        
0           46   2
1            4  48