Search code examples
machine-learningscikit-learnmetrics

I have done a classification problem where I am getting 99.9% accuracy but precision,recall,f1 is coming 0


Post Average Ensemble classification, I am getting a wierd confusion matrix and even weirder metric scores.

Code:-

x = data_train[categorical_columns + numerical_columns]
y = data_train['target']
from imblearn.over_sampling import SMOTE

x_sample, y_sample = SMOTE().fit_sample(x, y.values.ravel())

x_sample = pd.DataFrame(x_sample)
y_sample = pd.DataFrame(y_sample)

# checking the sizes of the sample data
print("Size of x-sample :", x_sample.shape)
print("Size of y-sample :", y_sample.shape)
# Train-Test split.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x_sample, y_sample, 
                                                    test_size=0.40, 
                                                    shuffle=False)

Accuracy is 99.9% but recall,f1-score and precision are 0. Never faced this problem before.I have used Adaboost Classifier.

Confusion Matrix for ADB: 
 [[46399    25]
 [    0     0]]
Accuracy for ADB: 
 0.9994614854385663
Precision for ADB: 
 0.0
Recall for ADB: 
 0.0
f1_score for ADB: 
 0.0

Since it is an imbalanced dataset so I have used SMOTE. And now I am getting the results as follows:

Confusion Matrix for ETC: 
 [[    0     0]
 [  336 92002]]
Accuracy for ETC: 
 0.99636119474106
Precision for ETC: 
 1.0
Recall for ETC: 
 0.99636119474106
f1_score for ETC: 
 0.9981772811109906

Solution

  • This is happening because you have unbalanced dataset (99.9% 0's and only 0.1% 1's). In such scenario's using accuracy as metric can be misleading.

    You can read more about what metrics to use in such scenario's here