sklearn jaccard_score giving a wrong result

I've used the sklearn.metrics.jaccard_score to collect the refered score from my python model's binary classification test. It outputs as shown below, but when I calculate by hand the metric, it yields another value. Am I mistaken about the meaning of "jaccard" in this function's usage? Or am I using it wrong? All the other metrics collected by the sklearn functions are returning correct values. There follows my code, with the test of jaccard by hand (doing in calculator by comparing the vectors as sets yields the same, as I'm (not so much) relieved it does).

def test(X, y, model):
  predictions = model.predict(X, verbose=1).ravel()
  report = classification_report(y, predictions, target_names=['nao_doentes', 'doentes'])
  confMatrix = confusion_matrix(y, predictions)
  tn, fp, fn, tp = confMatrix.ravel()
  jaccard = jaccard_score(y, predictions) # Se comportando de forma estranha

  print(tn, fp, fn, tp)
  print("Jaccard by function: {}".format(jaccard))
  # Note that in binary classification, recall of the positive class is also known as “sensitivity”;
  # recall of the negative class is “specificity”.

  dice = ((2*tp) / ((2*tp) + fp + fn))
  jaccard = ((tp + tn) / ((2*(tp + tn + fn + fp)) - (tp + tn)))
  print("Jaccard by hand: {}".format(jaccard))

And then follows the output:

2 0 1 1
[1. 0. 0. 0.]
[1 0 1 0]
              precision    recall  f1-score   support

 nao_doentes       0.67      1.00      0.80         2
     doentes       1.00      0.50      0.67         2

    accuracy                           0.75         4
   macro avg       0.83      0.75      0.73         4
weighted avg       0.83      0.75      0.73         4

[[2 0]
 [1 1]]
Jaccard by function: 0.5
Jaccard by hand: 0.6

As a second issue, why classification_report appears to be putting nao_doentes (non sick, in portuguese) as 1 and doentes (sick) as 0? Shouldn't it be putting in the opposite way? nao_doentes is set as 0 and doentes as 1 in my sets (so in y).


  • Looking at the help page, jaccard score is defined as:

    the size of the intersection divided by the size of the union of two label sets,

    And they look only at the positive class:

    jaccard_score may be a poor metric if there are no positives for some samples or classes. Jaccard is undefined if there are no true or predicted labels, and our implementation will return a score of 0 with a warning.

    In the confusion matrix you have, you have:

    intersection = tp # you have 1
    union = tp+fp # you have 2 
    jaccard = intersection / union

    and should give you 1 / (1+1) = 0.5 .

    Your label is correct. You can convert the labels and you see that you get the same confusion matrix:

    import pandas as pd
    labels = pd.Categorical(['nao_doentes','doentes'],categories=['nao_doentes','doentes'])
    prediction = [1 ,0 ,0, 0]
    y = [1 ,0, 1, 0]
    col_0   nao_doentes doentes
    nao_doentes 2   0
    doentes     1   1