I can't seem to find it or probably my knowledge on statistics and its terms are the problem here but I want to achieve something similar to the graph found on the bottom page of the LDA lib from PyPI and observe the uniformity/convergence of the lines. How can I achieve this with Gensim LDA?
You are right to wish to plot the convergence of your model fitting. Gensim unfortunately does not seem to make this very straight forward.
Run the model in such a way that you will be able to analyze the output of the model fitting function. I like to setup a log file.
import logging
logging.basicConfig(filename='gensim.log',
format="%(asctime)s:%(levelname)s:%(message)s",
level=logging.INFO)
Set the eval_every
parameter in LdaModel
. The lower this value is the better resolution your plot will have. However, computing the perplexity can slow down your fit a lot!
lda_model =
LdaModel(corpus=corpus,
id2word=id2word,
num_topics=30,
eval_every=10,
pass=40,
iterations=5000)
Parse the log file and make your plot.
import re
import matplotlib.pyplot as plt
p = re.compile("(-*\d+\.\d+) per-word .* (\d+\.\d+) perplexity")
matches = [p.findall(l) for l in open('gensim.log')]
matches = [m for m in matches if len(m) > 0]
tuples = [t[0] for t in matches]
perplexity = [float(t[1]) for t in tuples]
liklihood = [float(t[0]) for t in tuples]
iter = list(range(0,len(tuples)*10,10))
plt.plot(iter,liklihood,c="black")
plt.ylabel("log liklihood")
plt.xlabel("iteration")
plt.title("Topic Model Convergence")
plt.grid()
plt.savefig("convergence_liklihood.pdf")
plt.close()