Search code examples
gensimword2vecdoc2vec

Can I preserve the random state of a doc2vec mode for each document I want to infer by infering all documents at the same time?


is there a way to infer multiple documents at the same time to preserve the random state of the model using Gensim Doc2Vec?

The function infer_vector is defined as

infer_vector(doc_words, alpha=None, min_alpha=None, epochs=None, steps=None)¶

where doc_words (list of str) – A document for which the vector representation will be inferred. And I could not find any opther option to infer multiple documents at the same time.


Solution

  • There's no current option to infer multiple documents at once. It's one of many wishlist improvements for infer_vector() (collected in an open issue), but there's no work in progress or targeted release for that to arrive.

    I'm not sure what you mean by "preserve the random state of the model". The main motivations for batching that I can see would be user convenience, or added performance via multithreading.

    If what you really want is deterministic inference, see an answer in the Gensim FAQ which explains why deterministic Doc2Vec inference isn't necessarily a good idea. (It also includes a link to an issue with some ideas for how to force it, if you're determined to do that despite the good reasons not to.)