I've used the sklearn.metrics.jaccard_score
to collect the refered score from my python model's binary classification test. It outputs as shown below, but when I calculate by hand the metric, it yields another value. Am I mistaken about the meaning of "jaccard" in this function's usage? Or am I using it wrong? All the other metrics collected by the sklearn functions are returning correct values.
There follows my code, with the test of jaccard by hand (doing in calculator by comparing the vectors as sets yields the same, as I'm (not so much) relieved it does).
def test(X, y, model):
predictions = model.predict(X, verbose=1).ravel()
report = classification_report(y, predictions, target_names=['nao_doentes', 'doentes'])
confMatrix = confusion_matrix(y, predictions)
tn, fp, fn, tp = confMatrix.ravel()
jaccard = jaccard_score(y, predictions) # Se comportando de forma estranha
print(tn, fp, fn, tp)
print(predictions)
print(y)
print(report)
print(confMatrix)
print("Jaccard by function: {}".format(jaccard))
# Note that in binary classification, recall of the positive class is also known as “sensitivity”;
# recall of the negative class is “specificity”.
dice = ((2*tp) / ((2*tp) + fp + fn))
jaccard = ((tp + tn) / ((2*(tp + tn + fn + fp)) - (tp + tn)))
print(dice)
print("Jaccard by hand: {}".format(jaccard))
And then follows the output:
2 0 1 1
[1. 0. 0. 0.]
[1 0 1 0]
precision recall f1-score support
nao_doentes 0.67 1.00 0.80 2
doentes 1.00 0.50 0.67 2
accuracy 0.75 4
macro avg 0.83 0.75 0.73 4
weighted avg 0.83 0.75 0.73 4
[[2 0]
[1 1]]
Jaccard by function: 0.5
0.6666666666666666
Jaccard by hand: 0.6
As a second issue, why classification_report
appears to be putting nao_doentes
(non sick, in portuguese) as 1 and doentes
(sick) as 0? Shouldn't it be putting in the opposite way? nao_doentes
is set as 0 and doentes
as 1 in my sets (so in y).
Looking at the help page, jaccard score is defined as:
the size of the intersection divided by the size of the union of two label sets,
And they look only at the positive class:
jaccard_score may be a poor metric if there are no positives for some samples or classes. Jaccard is undefined if there are no true or predicted labels, and our implementation will return a score of 0 with a warning.
In the confusion matrix you have, you have:
intersection = tp # you have 1
union = tp+fp # you have 2
jaccard = intersection / union
and should give you 1 / (1+1) = 0.5 .
Your label is correct. You can convert the labels and you see that you get the same confusion matrix:
import pandas as pd
labels = pd.Categorical(['nao_doentes','doentes'],categories=['nao_doentes','doentes'])
prediction = [1 ,0 ,0, 0]
y = [1 ,0, 1, 0]
pd.crosstab(labels[y],labels[prediction])
col_0 nao_doentes doentes
row_0
nao_doentes 2 0
doentes 1 1