Search code examples
mathvectorsimilarity

A proper vector similarity index


I'm trying to adjust cosine similarity to determine how similar two vectors are, with respect to entries. Since the obtained measure is invariant under vector scale {(0, 1, 2) and (0, 2, 4) have cosine similarity of 1}, what would be the way to extend the similarity measure to account for the initial vector scale? I thought of multiplying by min{|v1|, |v2|}/max{|v1|, |v2|}, with |v| denoting a vector v norm, to preserve the bounds of -1 and 1. Any suggestions are highly appreciated.


Solution

  • Well, cosine similarity is based on the angle between both vectors (which doesn't depend on the length of the vectors). If you need something that takes the length of the vectors into account then you need to think about how vector length influences similarity in your context.

    Also note that you can always post-process a similarity or distance measure if need to stay within certain boundaries (like [-1, 1]). A popular functions for doing such transforms is the arctan.

    For example, instead of extending the cosine similarity you could try the Euclidean distance with an appropriate transformation:

    d = Euclidean distance between your vectors
    similarity =  1 - 2 * arctan(d) / (pi/2) 
    

    But as I said, the "correct" formula depends on your context.