Search code examples
pythonnumpymetricssimilarity

Similarity Metric using numpy


I am trying to define a similarity metric of my own inspired by jaccard similarity score. Only extra thing I wanted in jaccard metric is if considered frequency of the label too. For that purpose I have written this code snippet:

u = [12,0,3]
v = [24,6,1]
num = 0
den = 0
for i in range(3):
    if u[i]!=0 and v[i] != 0:
        num+=(u[i]+v[i])
    den+=(u[i]+v[i])
print(1 - num/den)

So my question is

  1. Can this be done by numpy's bitwise operator?
  2. Is there any other similarity metric that I can use? I have tried cosine similarity. Which will be more helpful?

Solution

  • An approach with numpy's vectorized function:

    arr = np.array([u,v])
    
    s = arr.sum(0)
    (s*(arr==0).any(0)).sum()/s.sum()
    

    Output:

    0.13043478260869565