Search code examples
pythonsimilaritycosine-similarity

python glove similarity measure calculation


i am trying to understand how python-glove computes most-similar terms.

Is it using cosine similarity?

Example from python-glove github https://github.com/maciejkula/glove-python/tree/master/glove :enter image description here

I know that from gensim's word2vec, the most_similar method computes similarity using cosine distance.
enter image description here


Solution

  • The project website is a bit unclear on this point:

    The Euclidean distance (or cosine similarity) between two word vectors provides an effective method for measuring the linguistic or semantic similarity of the corresponding words.

    Euclidean distance is not the same as cosine similarity. It sounds like either works well enough, but it does not specify which is used.

    However, we can observe the source of the repo you are looking at to see:

    dst = (np.dot(self.word_vectors, word_vec)
           / np.linalg.norm(self.word_vectors, axis=1)
           / np.linalg.norm(word_vec))
    

    It uses cosine similarity.