Search code examples
pythonmachine-learningscikit-learnlogistic-regressionstatsmodels

How to predict new values using statsmodels.formula.api (python)


I trained the logistic model using the following, from breast cancer data and ONLY using one feature 'mean_area'

from statsmodels.formula.api import logit
logistic_model = logit('target ~ mean_area',breast)
result = logistic_model.fit()

There is a built in predict method in the trained model. However that gives the predicted values of all the training samples. As follows

predictions = result.predict()

Suppose I want the prediction for a new value say 30 How do I used the trained model to out put the value? (rather than reading the coefficients and computing manually)


Solution

  • You can provide new values to the .predict() model as illustrated in output #11 in this notebook from the docs for a single observation. You can provide multiple observations as 2d array, for instance a DataFrame - see docs.

    Since you are using the formula API, your input needs to be in the form of a pd.DataFrame so that the column references are available. In your case, you could use something like .predict(pd.DataFrame({'mean_area': [1,2,3]}).

    statsmodels .predict() uses the observations used for fitting only as default when no alternative is provided.