python machine-learning scikit-learn multiclass-classification

Multiclass classification using Gaussian NB, gives same output for accuracy, precision and f1 score

I am new to Python and classification algorithms. I am using GaussianNB for the multiclass classification of NSL KDD dataset, and in the end, I need to obtain the values of precision, recall, f1 score.

from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix, zero_one_loss
from sklearn.metrics import classification_report

from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB()
y_pred = gnb.fit(train_x, train_Y).predict(test_x)

results_nm = confusion_matrix(test_Y,y_pred)
#print(results_nm)
print(classification_report(test_Y,y_pred))
print(accuracy_score(test_Y,y_pred))
print("Precision Score : ",precision_score(test_Y,y_pred, 
                                           pos_label='positive',
                                           average='micro'))
print("Recall Score : ",recall_score(test_Y,y_pred, 
                                           pos_label='positive',
                                           average='micro'))
print(f1_score(test_Y,y_pred,average='micro'))

I followed the instructions in a similar question at sklearn metrics for multiclass classification.

The output as follows, but I am getting the same output for all three. What could be the reason for that?

Solution

This can happen, as you can see in your confusion matrix, your micro average for all 3 metrics are in fact the same.

In the micro-average method, you sum up the individual true positives, false positives, and false negatives of the system for different sets and apply them to get the statistics. For example, for a set of data, the system's

True positive (TP1)  = 12
False positive (FP1) = 5
False negative (FN1) = 10

Then precision (P1) and recall (R1) will be (12/(12+5)) and (12/(12+10))

If FP1 == FN1, then they both will be the same.

and for a different set of data, the system's

True positive (TP2)  = 50
False positive (FP2) = 7
False negative (FN2) = 7

Then precision (P2) and recall (R2) will be the same.

Now, the average precision and recall of the system using the Micro-average method is

The Micro-average F-Score will be simply the harmonic mean of these two figures.

So, for specific values of FP and FN, it's possible for all these metrics to be the same. From the equation, we can say, if FP1 + FP2 == FN1 + FN2, then the micro-average precision and recall will be the same.

There are ways you can put values in the equation so that all 3 metrics give the same value, so you can try macro-average or weighted average.