Search code examples
pythonscikit-learncross-validationmultilabel-classification

How to get F1 score per label using Sklearn's cross_validation (multi-label classification)


I am trying to do multi-label classification using sklearn's cross_val_score function (http://scikit-learn.org/stable/modules/cross_validation.html).

scores = cross_validation.cross_val_score(clf, X_train, y_train,
        cv = 10, scoring = make_scorer(f1_score, average = None))

I want the F1-score for each label returned. This sort of works for the first fold, but gives an error right after:

ValueError: scoring must return a number, got [ 0.55555556  0.81038961  0.82474227  0.67153285  0.76494024  0.89087657 0.93502377  0.11764706  0.81611208] (<type 'numpy.ndarray'>)

I assume this error is raised because cross_val_score expects a number to be returned. Is there any other way I can use cross_val_score to get the F1-score per label?


Solution

  • I solved the problem by making some changes in .../scikit-learn/sklearn/cross_validation.py. More specifically I commented out these lines:

    1651     if not isinstance(score, numbers.Number):
    1652         raise ValueError("scoring must return a number, got %s (%s) instead."
    1653                          % (str(score), type(score)))
    

    This eliminates the check whether the type is a number, thus allowing a numpy array to be passed.