I am working with a highly imbalanced dataset (more values in class 0 and few in class 1). To analyse the performance of the classifier I am using the F1 metric. I set average = None in the F1 function from scikitlearn, this is because I want to check its performance on class 0 and 1 separately and am only concerned about the classifier's performance on class 1.
value = f1_score(yTest, y_scores, average=None)
value[1] gives me the required value
Now for hyperparameter tuning using gridserachcv, I create the F1 score in the following way
f1_scorer = make_scorer(f1_score, average=None)
However this gives an array which is not accepted by GridSearchCV(svc_clf, param_grid, cv=nfolds, error_score=0.0, scoring=f1_scorer2)
How do I extract the value at index 1 to be used as the metric in the scoring parameter. This is because I want to place focus on the classifier's performance on class 1 during hyperparameter tuning.
I did try some naive ways of writing f1_scorer2[1] etc but it says '_PredictScorer' object is not subscriptable
If you only care about the f1 score for the positive class, then the defaults average='binary'
and pos_label=1
should be fine.
If you needed to in another situation, you should be able to define a thin wrapper function:
def pos_f1_score(estimator, X, y):
y_pred = estimator.predict(X)
f1s = f1_score(y, y_pred, average=None)
return f1s[1]
Finally, a number of common metrics have already been made into scorers that you can invoke with a string for scorer
; see a list here.