Search code examples
gensimword2vec

Combining Doc2Vec sentences into paragraph vectors


In Gensim's Doc2Vec, how do you combine sentence vectors to make a single vector for a paragraph? I realise you can train on the entire paragraph, but it would obviously be better to train on individual sentences, for context, etc. (I think...?)

Any advice or normal use case?

Also, how would I retrieve sentence/paragraph vectors from the model?


Solution

  • Doc2Vec's architecture itself doesn't involve any parsing and it makes sense to train/test on the entire paragraph.

    In original paper, author shows results with just treating entire paragraph as one sentence, outperforming existing techniques.