Search code examples
algorithmmathcosine-similarity

Finding the best cosine similarity in a set of vectors


I have n vectors, each with m elements (real number). I want to find the pair where there cosine similarity is maximum among all pairs.

The straightforward solution would require O(n2m) time.

Is there any better solution?

update

Cosine similarity / distance and triangle equation Inspires me that I could replace "cosine similarity" with "chord length" which loses precision but increases speed a lot. ( there are many existing solutions solving Nearest Neighbor in metric space, like ANN )


Solution

  • Cosine similarity sim(a,b) is related to Euclidean distance |a - b| by

    |a - b|² = 2(1 - sim(a,b))
    

    for unit vectors a and b.

    That means cosine similarity is greatest when Euclidean distance is smallest after normalizing by the L2 norm, and the problem reduces to the closest pair of points problem, which can be solved in O(n lg n) time.