Search code examples
pythonmachine-learningscikit-learnclassificationmetrics

Why sklearn returns the accuracy and weighted-average recall the same value in binary classification?


My problem is a binary classification where I use the following code to get the accuracy and weighted average recall.

from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(random_state = 0, class_weight="balanced")

from sklearn.model_selection import cross_validate
cross_validate(clf, X, y, cv=10, scoring = ('accuracy', 'precision_weighted', 'recall_weighted', 'f1_weighted'))

I noted that the values of accuracy and weighted average recall are equal. However, as I understand these two metrics capture two different aspects and thus, I am not clear why they are exactly equal.

I found a post that have similar question: https://www.researchgate.net/post/Multiclass_classification_micro_weighted_recall_equals_accuracy. However, I did not found the answers of that post useful.

I am happy to provide more details if needed.


Solution

  • Accuracy is:

    TP + TN / (P+ N)
    

    So let's assume you have 50 positive classes and 50 negative, and somehow this is prediction 25 correct of your positive classes and 25 correct of your negativ classes, then:

    25 + 25 / (50+50) = 0.5
    

    Weighted average recall: First recall: TP/P = 25/50 = 0.5

    Weighted recall:

    (recall_posivite*number_positve)+(recall_negative*number_negative)/(number_positive + number_negativ) = 0.5*50+0.5*50/(50+50) = 50/100 = 0.5
    

    I hope this helps to understand that it can happen!