Search code examples
pythonnlpgensimword2vecdoc2vec

Find similarity with doc2vec like word2vec


Is there a way to find similar docs like we do in word2vec

Like:

  model2.most_similar(positive=['good','nice','best'],
    negative=['bad','poor'],
    topn=10)

I know we can use infer_vector,feed them to have similar ones, but I want to feed many positive and negative examples as we do in word2vec.

is there any way we can do that! thanks !


Solution

  • The doc-vectors part of a Doc2Vec model works just like word-vectors, with respect to a most_similar() call. You can supply multiple doc-tags or full vectors inside both the positive and negative parameters.

    So you could call...

    sims = d2v_model.docvecs.most_similar(positive=['doc001', 'doc009'], negative=['doc102'])
    

    ...and it should work. The elements of the positive or negative lists could be doc-tags that were present during training, or raw vectors (like those returned by infer_vector(), or your own averages of multiple such vectors).