Search code examples
gensimword2vecdoc2vec

Is there any way to get the vocabulary size from doc2vec model?


I am using gensim doc2vec. I want know if there is any efficient way to know the vocabulary size from doc2vec. One crude way is to count the total number of words, but if the data is huge(1GB or more) then this won't be an efficient way.


Solution

  • If model is your trained Doc2Vec model, then the number of unique word tokens in the surviving vocabulary after applying your min_count is available from:

    len(model.wv.vocab)
    

    The number of trained document tags is available from:

    len(model.docvecs)