Search code examples
pythonpython-2.7python-3.xscikit-learnconfusion-matrix

Why there are two different results on different metrics in Sklearn


what are the difference between (recall, precison, fi-score) in metrics.classification_report and same metriceses when I use them separately by metrics.precision_score, metrics.recall_score, metrics.f1 score.

Can please someone look at this code and explain the difference

from sklearn.svm import LinearSVC
clf_svm_linear = LinearSVC(C=20.0)
clf_svm_linear.fit(X_train, y_train)
y_pred = clf_svm_linear.predict(X_test)
print ' Results on Validation data'
print metrics.classification_report(y_test, y_pred, target_names=['No Diabetes', 'Diabetes']) 
print "==================================================================="
print "The accuracy on validation dataset of Linear SVM: \t",     metrics.accuracy_score(y_test, y_pred)
print "Precision on validation dataset of Linear SVM:    \t", metrics.precision_score(y_test, y_pred)
print "Recall on validation dataset of Linear SVM :      \t", metrics.recall_score(y_test, y_pred)
print "F1 score on validation dataset of Linear SVM:     \t", metrics.f1_score(y_test, y_pred)

when run the above code I got this results as in the pic Why the avg/total on the report does not match precision, recall and f1-score when I print thier scores independently. enter image description here


Solution

  • precision_score is not an average, it is precision where one of the classes is the positive one (in your case - Diabetes), this is why your call to precision corresponds to precision of Diabetes in the summary, and similarly - recall and f1 of these guys. Taking an average of the asymetric metric is making it balanced, it is not the same as a "regular" metric.

    In order to get average, you would call

    print 0.5 * (metrics.precision_score(y_test, y_pred, pos_label='No Diabetes') + 
                 metrics.precision_score(y_test, y_pred, pos_label='Diabetes'))