Search code examples
gensimword2vecdoc2vec

Word vectors from a whole doc2vec model vs. word vectors from a particular document


I trained a gensim's Doc2Vec model with default word2vec training (dm=1). I can get the word vectors from the global model in model.wv.vectors. But the documentation says that the same word ("leaves" in the example) won't have the same vector depending of the document context where it appear.

So I'm a bit confused : in the model.wv.vectors, will the word "leaves" by example, have the same vector for all the documents used to train the model (that may be contradictory with what I understand from the documentation) ? If not, how to get the word vectors from a particular document ?


Solution

  • That documentation is misleading. The word-token 'leaves' will have only one word-vector in that model.

    I'm guessing the author of that comment may have meant that during model-training in PV-DM mode (dm=1), the training-predictions would be influenced by a combination of the word-vector and the 'floating' doc-vector for that text (and other neighboring word-vectors within the context-window). But still, the one word just has the one vector, and the description there is confused.