Search code examples
pythonscikit-learnscoring

Why are the outputs of jaccard_score and jaccard_similarity_score different?


When trying to use jaccard_similarity_score I get "Deprecation Warning: jaccard_similarity_score has been deprecated and replaced with jaccard_score. It will be removed in version 0.23. This implementation has surprising behavior for binary and multiclass classification tasks."

The classic explanation for Jaccard Similarity Score matches the output from the deprecated jaccard_similarity_score .

However, the results of jaccard_score and jaccard_similarity_score are different (even when trying different parameters, as shown).

from sklearn.metrics import jaccard_similarity_score, jaccard_score  
y_pred = [0,1,0,1,0,0,0,1,0,1]  
y_true = [0,0,0,1,0,1,0,1,0,0] 
print("jaccard_similarity_score=",jaccard_similarity_score(y_true, y_pred),'\n')  
for param in ['weighted', 'micro', 'macro']:  
    print(param, " jaccard_score=", jaccard_score(y_true, y_pred,  average=param))    

This is the output of the code above:

jaccard_similarity_score= 0.7 

weighted  jaccard_score= 0.5575  
micro  jaccard_score= 0.5384615384615384  
macro  jaccard_score= 0.5125 

Is there an option that can be applied for the results to be equal ? Is the new jaccard_score working as expected ?


Solution

  • As you can see the implementation from https://github.com/scikit-learn/scikit-learn/blob/a5d4c61/sklearn/metrics/classification.py#L311

    jaccard_similarity_score actually calculates accuracy.

    So, actually, jaccard_similarity_score is not a good function here.