Search code examples
numpylinear-algebracosine-similarity

Is the cosine similarity of the average vector same as the average of the cosine similarities?


I have a vector v which I want to compare with a set of vectors U = [u1, u2, u3 ...]. I want to find the average cosine similarity of v with all vectors in U.

My first thought was to compute:

s1 = cosine_similarity(v, u1) 
s2 = cosine_similarity(v, u2) 
... 

and then take the average as

s = np.mean([s1, s2, s3 ...])

But I was wondering if this process is the same as just taking the average of U to get a single vector u and then compute

u = np.mean(U)
s = cosine_similarity(x, u)

Are the results the same in both cases?


Solution

  • You can just take a simple example and check if the results are the same. Short answer, no.

    import numpy as np
    from sklearn.metrics.pairwise import cosine_similarity
    
    x1 = np.random.rand(1, 100)
    x2 = np.random.rand(1, 100)
    
    y = np.random.rand(1, 100)
    
    print(cosine_similarity(x1, y) + cosine_similarity(x2, y))
    
    m = x1+x2
    print(cosine_similarity(m, y))