Search code examples
gensimldacorpus

Gensim: How to load corpus from saved lda model?


When I saved my LdaModel lda_model.save('model'), it saved 4 files:

  1. model
  2. model.expElogbeta.npy
  3. model.id2word
  4. model.state

I want to use pyLDAvis.gensim to visualize the topics, which seems to need the model, corpus and dictionary. I was able to load the model and dictionary with:

lda_model = LdaModel.load('model')
dict = corpora.Dictionary.load('model.id2word')

Is it possible to load the corpus? How?


Solution

  • Sharing this here because it took me awhile to find out the answer to this as well. Note that dict is not a valid name for a dictionary and we use lda_dict instead.

    # text array is a list of lists containing text you are analysing
    # eg. text_array = [['volume', 'eventually', 'metric', 'rally'], ...]
    # lda_dict is a gensim.corpora.Dictionary object
    
    bow_corpus = [lda_dict.doc2bow(doc) for doc in text_array]