Search code examples
setsimilarity

Jaccard Similarity Between Null Sets


I want to compute the Jaccard similarity between two data sets based on the existence/nonexistence of a list of standard codes. For example (x,y,z are data sets): Data sets x and y don't have any standard codes (Null), therefore I set the list values as zeroes.

 x = [0,0,0] 
 y = [0,0,0] 
 z = [0,1,0] 

from sklearn.metrics import jaccard_similarity_score
jaccard_similarity_score(x,y),jaccard_similarity_score(x, z)

Jaccard sim between x and z is 0.66 (2/3). Is there any similarity measure that deals with set intersection between two empty sets? In my case, I want to set the similarity between data set x and y as 0, not 1 (as computed using Jaccard).


Solution

  • It depends on each case but on your case I think that you should set the Jaccard similarity of set x and y as 1 because as you stated:

    Dataset x and y does not have any standard codes (Null)

    So someone can argue that x and y are quite similar (both of them have no standard codes). In any case you should check if the denominator of the fraction is zero and handle it (maybe you could give a flag value -1 for example).