I am using gensim doc2vec
. I want know if there is any efficient way to know the vocabulary size from doc2vec. One crude way is to count the total number of words, but if the data is huge(1GB or more) then this won't be an efficient way.
If model
is your trained Doc2Vec model, then the number of unique word tokens in the surviving vocabulary after applying your min_count
is available from:
len(model.wv.vocab)
The number of trained document tags is available from:
len(model.docvecs)