Search code examples
pythontextgensimsentence-similarity

What does Similarity Score mean in gensim?


I have used Gensim library to find the similarity between a sentence against a collection of paragraphs, a dataset of texts. I have used Cosine similarity, Soft cosine similarity and Mover measures separately. Gensim returns a list of items including docid and similarity score. For Cosine similarity and Soft cosine similarity, I guess the similarity score is the cosine between the vectors. Am I right?

In Gensim documents, they wrote it is the semantic relatedness, and no extra explanation. I have search a lot, but did not find any answer. Any help please


Solution

  • Usually by 'similarity', people are seeking to find a measure semantic relatedness - but whether the particular values calculated achieve that will depend on lots of other factors, such as the sufficiency of training data & choice of other appropriate parameters.

    Within each code context, 'similarity' has no more and no less meaning than how it's calculated right there - usually, that's 'cosine similarity between vector representations'. (When there's no other hints it means something different, 'cosine similarity' is typically a safe starting assumption.)

    But really: the meaning of 'similarity' at each use is no more and no less than whatever that one code path's docs/source-code dictate.

    (I realize that may seem an indirect & unsatisfying answer. If there are specific uses in context in Gensim source/docs/example where the meaning is unclear, you could point those out & I might be able to clarify those more.)