Search code examples
machine-learningnlpscikit-learnprecision-recall

scikit-learn calculate F1 in multilabel classification


I am trying to calculate macro-F1 with scikit in multi-label classification

from sklearn.metrics import f1_score

y_true = [[1,2,3]]
y_pred = [[1,2,3]]

print f1_score(y_true, y_pred, average='macro')

However it fails with error message

ValueError: multiclass-multioutput is not supported

How I can calculate macro-F1 with multi-label classification?


Solution

  • In the current scikit-learn release, your code results in the following warning:

    DeprecationWarning: Direct support for sequence of sequences multilabel
        representation will be unavailable from version 0.17. Use
        sklearn.preprocessing.MultiLabelBinarizer to convert to a label
        indicator representation.
    

    Following this advice, you can use sklearn.preprocessing.MultiLabelBinarizer to convert this multilabel class to a form accepted by f1_score. For example:

    from sklearn.preprocessing import MultiLabelBinarizer
    from sklearn.metrics import f1_score
    
    y_true = [[1,2,3]]
    y_pred = [[1,2,3]]
    
    m = MultiLabelBinarizer().fit(y_true)
    
    f1_score(m.transform(y_true),
             m.transform(y_pred),
             average='macro')
    # 1.0