Search code examples
pythonnlpgensimdoc2vec

gensim doc2vec - How to infer label


I am using gensim's doc2vec implementation and I have a few thousand documents tagged with four labels.

yield TaggedDocument(text_tokens, [labels])

I'm training a Doc2Vec model with a list of these TaggedDocuments. However, I'm not sure how to infer the tag for a document that was not seen during training. I see that there is a infer_vector method which returns the embedding vector. But how can I get the most likely label from that?

An idea would be to infer the vectors for every label that I have and then calculate the cosine similarity between these vectors and the vector for the new document I want to classify. Is this the way to go? If so, how can I get the vectors for each of my four labels?


Solution

  • I found the solution:

    model.docvecs['my_tag']
    

    gives me the vector for a given tag. Easy