scikit-learn multiclass-classification precision-recall

Why do i get different precision, recall and f1 score for different methods of calculating the macro avearage

I calculated the macro-average of the P, R and F1 of my classification using two methods. Method 1 is

print("Macro-Average Precision:", metrics.precision_score(predictions, test_y, average='macro'))
print("Macro-Average Recall:", metrics.recall_score(predictions, test_y, average='macro'))
print("Macro-Average F1:", metrics.f1_score(predictions, test_y, average='macro'))

gave this result:

Macro-Average Precision: 0.6822
Macro-Average Recall: 0.7750
Macro-Average F1: 0.7094

Method 2 is:

print(classification_report(y_true, y_pred))

gave this result:

precision    recall  f1-score   support
       0       0.55      0.25      0.34       356
       1       0.92      0.96      0.94      4793
       2       0.85      0.83      0.84      1047
accuracy                           0.90      6196

macro avg       0.78      0.68      0.71      6196
weighted avg       0.89      0.90      0.89      6196

I expected the output in both methods to be same, since they were generated at the same time at the same run. Can someone explain why this happened or if there is a mistake somewhere?

Solution

As far as I can tell from the classification_report results, you have multiple classes.

If you check the documentation for the single functions in the metrics modules, the default parameters consider the class '1' as the default positive class.

I think what might be happening is that in your first calculation, it is a One versus all computation (0 and 2 are negative classes and 1 is the positive class). And in the second case, you are actually taking into account a true multi class situation.