Search code examples
matlabsimilarity

How to obtain jaccard similarity in matlab


I have a table:

   x   y   z 
A  2   0   3   
B  0   3   0    
C  0   0   4    
D  1   4   0

I want to calculate the Jaccard similarity in Matlab, between the vectors A, B, C and D. The formula is :

enter image description here

In this formula |x| and |y| indicates the number of items which are not zero. For example |A| number of items that is not zero is 2, for |B| and |C| it is 1, and for |D| it is 2.

|x intersect y| indicates the number of common items which are not zero. |A intersect B| is 0. |A intersect D| is 1, because the value of x in both is not zero.

e.g.: jaccard(A,D)= 1/3=0.33

How can I implement this in Matlab?


Solution

  • Matlab has a built-in function that computes the Jaccard distance: pdist.

    Here is some code

    X = rand(2,100);
    X(X>0.5) = 1;
    X(X<=0.5) = 0;
    
    JD = pdist(X,'jaccard')  % jaccard distance
    JI = 1 - JD;             % jaccard index
    

    EDIT

    A calculation that does not require the statistic toolbox

    a = X(1,:);
    b = X(2,:);
    JD = 1 - sum(a & b)/sum(a | b)