Search code examples
pythonnumpyscikit-learnprecision-recall

Why do I get a ValueError, when passing 2D arrays to sklearn.metrics.recall_score?


I want to use sklearn.metrics.recall_score to evaluate recall for a binary image segmentation task. Doing this works:

threshold = 0.5
predicted_mask = (probability_map > threshold).astype(np.int)
actual_mask = actual_mask.astype(np.int)
result = recall_score(actual_mask.flatten(), predicted_mask.flatten())

This however:

result = recall_score(actual_mask, predicted_mask)

gives me the error:

ValueError: Target is multilabel-indicator but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted', 'samples'].

actual_mask and predicted_mask are numpy-arrays with integers of 0 and 1.

It is not obvious to me from the documentation that this should not work:

sklearn.metrics.precision_score(y_true, y_pred, *, labels=None, pos_label=1, average='binary', sample_weight=None, zero_division='warn')

y_true: 1d array-like, or label indicator array / sparse matrix

y_pred: 1d array-like, or label indicator array / sparse matrix

What am I missing? And more importantly: Is the recall-value that I obtain using the flatten operation correct?


Solution

  • I know it's late but I will still answer since the documentation is not exactly clear. precision_score and recall_score do not treat 2D arrays as images. They treat each slice of them as individual prediction-ground truth pairs. Let's look at an example from the documentation.

    >>> y_true = [[0, 0, 0], [1, 1, 1], [0, 1, 1]]
    >>> y_pred = [[0, 0, 0], [1, 1, 1], [1, 1, 0]]
    >>> recall_score(y_true, y_pred, average=None)
    array([1. , 1. , 0.5])
    

    recall_score does not treat y_true or y_pred as a single 3x3 matrices. Instead it will treat y_pred[0] as the prediction for class 0, y_pred[1] as the prediction for class 1, etc. That's the reason why average = 'binary' does not work.

    Is the recall-value that I obtain using the flatten operation correct?

    Assuming your predicted_mask and actual_mask have the shape (Height, Width) as in a typical image segmentation task, then yes it's correct.