Search code examples
gensimword2vec

How to get vocabulary word count from gensim word2vec?


I am using gensim word2vec package in python. I know how to get the vocabulary from the trained model. But how to get the word count for each word in vocabulary?


Solution

  • Each word in the vocabulary has an associated vocabulary object, which contains an index and a count.

    vocab_obj = w2v.vocab["word"]
    vocab_obj.count
    

    Output for google news w2v model: 2998437

    So to get the count for each word, you would iterate over all words and vocab objects in the vocabulary.

    for word, vocab_obj in w2v.vocab.items():
      #Do something with vocab_obj.count