I am working on Doc2vec for text classification. It is creating a vector for a sentence with some given size (e.g.: 100, length of vector). I am not able to understand how it creates vector of that length.
i am following this link. in here they are creating a vector for sentence which will be saved in the doc2v model, i can't use this model for new data(production data) to test as there is no vector for new sentence. Error showing for new data
KeyError: "tag 'Test_2028' not seen in training corpus/invalid"
If you've created a gensim
Doc2Vec
model with your training data, it will only know trained vectors for the document tags that were present in the training data.
However, there's also the method infer_vector()
which can infer a compatible document-vector for a new text. The new text should be tokenized the same as the training data, and passed as a list-of-string-tokens to infer_vector()
.