python scikit-learn cross-validation grid-search

GridserachCV, make_scorer in python, for class specific F1

I am working with a highly imbalanced dataset (more values in class 0 and few in class 1). To analyse the performance of the classifier I am using the F1 metric. I set average = None in the F1 function from scikitlearn, this is because I want to check its performance on class 0 and 1 separately and am only concerned about the classifier's performance on class 1. value = f1_score(yTest, y_scores, average=None) value[1] gives me the required value

Now for hyperparameter tuning using gridserachcv, I create the F1 score in the following way f1_scorer = make_scorer(f1_score, average=None) However this gives an array which is not accepted by GridSearchCV(svc_clf, param_grid, cv=nfolds, error_score=0.0, scoring=f1_scorer2) How do I extract the value at index 1 to be used as the metric in the scoring parameter. This is because I want to place focus on the classifier's performance on class 1 during hyperparameter tuning. I did try some naive ways of writing f1_scorer2[1] etc but it says '_PredictScorer' object is not subscriptable

Solution

If you only care about the f1 score for the positive class, then the defaults average='binary' and pos_label=1 should be fine.

If you needed to in another situation, you should be able to define a thin wrapper function:

def pos_f1_score(estimator, X, y):
    y_pred = estimator.predict(X)
    f1s = f1_score(y, y_pred, average=None)
    return f1s[1]

Finally, a number of common metrics have already been made into scorers that you can invoke with a string for scorer; see a list here.