Search code examples
nlpword2vecsentence-similarity

How to find similar sentence from a corpus on word2vec?


I have implemented word2vec on my corpus using the TensorFlow tutorial: https://www.tensorflow.org/tutorials/text/word2vec#next_steps Now I'm want to give a sentence as input and want to find a similar sentence in the corpus.

Any leads on how I can perform this?


Solution

  • A simple word2vec model is not capable of such task, as it only relates word semantics to each other, not the semantics of whole sentences. Inherently, such a model has no generative function, it only serves as a look-up table.

    Word2vec models map word strings to vectors in the embedding space. To find similar words for a given sample word, one can simply go through all vectors in the vocabulary and find the ones that are closest (in terms of the 2-norm) from the sample word vector. For further information you could go here or here.

    However, this does not work for sentences as it would require a whole vocabulary of sentences of which to pick similar ones - which is not feasible.

    Edit: This seems to be a duplicate of this question.