Search code examples
pythonldagensimconvergence

How to monitor convergence of Gensim LDA model?


I can't seem to find it or probably my knowledge on statistics and its terms are the problem here but I want to achieve something similar to the graph found on the bottom page of the LDA lib from PyPI and observe the uniformity/convergence of the lines. How can I achieve this with Gensim LDA?


Solution

  • You are right to wish to plot the convergence of your model fitting. Gensim unfortunately does not seem to make this very straight forward.

    1. Run the model in such a way that you will be able to analyze the output of the model fitting function. I like to setup a log file.

      import logging
      logging.basicConfig(filename='gensim.log',
                          format="%(asctime)s:%(levelname)s:%(message)s",
                          level=logging.INFO)
      
    2. Set the eval_every parameter in LdaModel. The lower this value is the better resolution your plot will have. However, computing the perplexity can slow down your fit a lot!

      lda_model = 
      LdaModel(corpus=corpus,
               id2word=id2word,
               num_topics=30,
               eval_every=10,
               pass=40,
               iterations=5000)
      
    3. Parse the log file and make your plot.

      import re
      import matplotlib.pyplot as plt
      p = re.compile("(-*\d+\.\d+) per-word .* (\d+\.\d+) perplexity")
      matches = [p.findall(l) for l in open('gensim.log')]
      matches = [m for m in matches if len(m) > 0]
      tuples = [t[0] for t in matches]
      perplexity = [float(t[1]) for t in tuples]
      liklihood = [float(t[0]) for t in tuples]
      iter = list(range(0,len(tuples)*10,10))
      plt.plot(iter,liklihood,c="black")
      plt.ylabel("log liklihood")
      plt.xlabel("iteration")
      plt.title("Topic Model Convergence")
      plt.grid()
      plt.savefig("convergence_liklihood.pdf")
      plt.close()