Search code examples
pythonmachine-learningclassificationlogistic-regressionconfusion-matrix

Model precision is 0% in confusion matrix


I am trying to predict for a binary outcome using logistic regression in Python and my classification_report shows that my model is predicting at a 0% precision for my target variable=0. It is predicting at an 87% precision for my target variable=1

from sklearn.linear_model import LogisticRegression 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import classification_report 
from sklearn.metrics import confusion_matrix

X=df[['RegDec', 'SchoolDiv', 'SEX', 'Honor', 'TestOptional', 'TERRITORY', 'AcadamicIndex',
     'INSTAward','NEED', 'TOTAWD', 'ETHN3', 'IR_Total', 'pell']]
y= df ['Retained']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)


logmodel = LogisticRegression()
logmodel.fit(X_train,y_train) 

predictions=logmodel.predict (X_test)
print (classification_report(y_test,predictions)) 

Why is my precision for '0' 0? This is the output

 precision    recall  f1-score   support

           0       0.00      0.00      0.00        92
           1       0.87      1.00      0.93       614

    accuracy                           0.87       706
   macro avg       0.43      0.50      0.47       706
weighted avg       0.76      0.87      0.81       706
confusion_matrix (y_test, predictions) # not predicting 0s

array([[  0,  92],
       [  0, 614]], dtype=int64)

I want to be know if I have some errors that are affecting my results.


Solution

  • Your Confusion Matrix:

    [  0,  92]
    [  0, 614]
    

    tells you that you have 92 elements of class 0 and 614 of class 1 in your test set.

    It seems like no matter what data you feed your classifier with, it says 1.

    Without seeing your data we can just guess whats wrong...

    You either have data that does not "contain" enough information to predict your label and your classifier just "guesses" the most frequent class. Or you have so much more data of class 1 than class 0 that the accuracy is better if you allways guess 1 instead of trying to classify correctly.

    Things you can do:

    1.) You try to remove some class-1 elements of your data so that you have the same amount of class 1 and class 2 data-rows in your training set. (or get more class 0 data from somewhere)

    2.) Maybe for your data another classifier might fit better than logistic regression, you could try decision trees/ svm/ adaboost /... and see the results.

    3.) If its a real life problem you try to get more and better data, maybe with better sensors, from different sources or through feature engineering