I have a vector v
which I want to compare with a set of vectors U = [u1, u2, u3 ...]
.
I want to find the average cosine similarity of v
with all vectors in U
.
My first thought was to compute:
s1 = cosine_similarity(v, u1)
s2 = cosine_similarity(v, u2)
...
and then take the average as
s = np.mean([s1, s2, s3 ...])
But I was wondering if this process is the same as just taking the average of U to get a single vector u and then compute
u = np.mean(U)
s = cosine_similarity(x, u)
Are the results the same in both cases?
You can just take a simple example and check if the results are the same. Short answer, no.
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
x1 = np.random.rand(1, 100)
x2 = np.random.rand(1, 100)
y = np.random.rand(1, 100)
print(cosine_similarity(x1, y) + cosine_similarity(x2, y))
m = x1+x2
print(cosine_similarity(m, y))