Search code examples
pythonarraysdistancesimilarity

Determine the similarity between two arrays of counts


The Problem: I am trying to determine the similarity between two 1D arrays composed of counts. Both the positions and relative magnitudes of the counts inside the arrays are important.

X = [1, 5, 10, 0,  0, 0, 2]
Y = [1, 2,  0, 0, 10, 0, 5]
Z = [1, 3,  8, 0,  0, 0, 1]

In this case array X is more similar to array Z than array Y.

I have tried a few metrics including cosine distance, earth movers distance and histogram intersection and while cosine distance and earth movers distance work decently, only EMD really satisfies both of my conditions

I am curious to know if there are other algorithms / distance metrics out there that exist to answer this sort of problem.

Thank you!


Solution

  • One popular and simple method is root-mean-square, where you sum the squares of the differences between the elements, take the square root, and divide by the number of elements, In your case, X vs Y produces 2.1, and X vs Z produces 0.4.

    import math
    
    X = [1, 5, 10, 0,  0, 0, 2]
    Y = [1, 2,  0, 0, 10, 0, 5]
    Z = [1, 3,  8, 0,  0, 0, 1]
    
    def rms(a,b):
        return math.sqrt( sum((a1-b1)*(a1-b1) for a1,b1 in zip(a,b)))/len(a)
    
    print(rms(X,Y))
    print(rms(X,Z))