I am trying to get the mean of precision and recall for BOTH positive and negative class in a 10-fold cross validation. My model is a binary classifier.
I ran the codes below and unfortunately it only returned the mean precision and recall for the positive class. How can I tell the algorithm to return the mean precision and recall scores for the negative class as well?
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import cross_validate
scoring = {'accuracy' : make_scorer(accuracy_score),
'precision' : make_scorer(precision_score),
'recall' : make_scorer(recall_score),
'f1_score' : make_scorer(f1_score)}
results = cross_validate(model_unbalanced_data_10_times_weight, X, Y, cv=10, scoring=scoring)
np.mean(results['test_precision'])
np.mean(results['test_recall'])
I've also tried printing the classification report using the command "classification_report(y_test, predictions)
" which resulted in the printout in screenshot below. However, I believe the precision/recall scores from the classification report is based on 1 run only and not the average over 10 folds (correct me if I am wrong).
Based on our discussion above, I do believe that computing predictions for every cv fold and computing cross_validation_report
on them should be the right way to go. Results should now take the number of cv folds into account:
>>> from sklearn.metrics import classification_report
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.model_selection import cross_val_predict
>>>
>>> iris = load_iris()
>>>
>>> rf_clf = RandomForestClassifier()
>>>
>>> preds = cross_val_predict(estimator=rf_clf,
... X=iris["data"],
... y=iris["target"],
... cv=15)
>>>
>>> print(classification_report(iris["target"], preds))
precision recall f1-score support
0 1.00 1.00 1.00 50
1 0.92 0.94 0.93 50
2 0.94 0.92 0.93 50
accuracy 0.95 150
macro avg 0.95 0.95 0.95 150
weighted avg 0.95 0.95 0.95 150