Search code examples
pythonmachine-learningdata-scienceword2vecdoc2vec

How doc2vec creates vector for sentence


I am working on Doc2vec for text classification. It is creating a vector for a sentence with some given size (e.g.: 100, length of vector). I am not able to understand how it creates vector of that length.

i am following this link. in here they are creating a vector for sentence which will be saved in the doc2v model, i can't use this model for new data(production data) to test as there is no vector for new sentence. Error showing for new data

KeyError: "tag 'Test_2028' not seen in training corpus/invalid"


Solution

  • If you've created a gensim Doc2Vec model with your training data, it will only know trained vectors for the document tags that were present in the training data.

    However, there's also the method infer_vector() which can infer a compatible document-vector for a new text. The new text should be tokenized the same as the training data, and passed as a list-of-string-tokens to infer_vector().