Search code examples
language-agnosticmetricscoefficients

Jaccard Coefficient


I have been given a formula to calculate the Jaccard Coefficient for two real vectors a and b of length n.

enter image description here

Is this formula correct? If I calculate the coefficient for the vectors {5, 3, 1, 0, 3} and {7, 1, 3, 2, 1} I get a negative number which I thought is not allowed for metrics).

(5*7 + 3*1 + 1*3 + 0*2 + 3*1) = 44

44 / (12+ 14 - 44) = -22/9


Solution

  • As originally defined by Jaccard, the similarity coefficient is the size of the intersection divided by the size of the union. Since both are sizes, a negative result obviously isn't possible.

    What you show in the question looks sort of like the Jaccard similarity for a bit vector. However, for that you need to square each of the terms on the left in the denominator, usually shown something like this:

    enter image description here

    I suspect the lack of squaring is what's leading to the problem you're currently seeing--without it, we can normally expect the denominator to be negative. More specifically, for one term, (A + B) - (A * B) to be positive, at least one of A or B must be less than 1.