Why am I getting same set of topics # words in gensim lda model? I used these parameters. I checked there are no duplicate documents in my corpus.
lda_model = gensim.models.ldamodel.LdaModel(corpus=MY_CORPUS,
id2word=WORD_AND_ID,
num_topics=4,
minimum_probability=minimum_probability,
random_state=100,
update_every=1,
chunksize=100,
passes=10,
alpha='auto', # symmetric, asymmetric
per_word_topics=True)
[
(0, '0.004*lily + 0.01*rose + 0.00*jasmine'),
(1, '0.005*geometry + 0.07*algebra + 0.01*calculation'),
(2, '0.003*painting + 0.001*brush + 0.01*colors'),
(3, '0.005*geometry + 0.07*algebra + 0.01*calculation')
]
Notice: Topic #1 and #3 are identical.
Each of the topics likely contains a large number of words weighted differently. When a topic is being displayed (e.g. using lda_model.show_topics()
) you are going to get only a few words with the largest weights. This does not mean that there are no differences between topics among the remaining vocabulary.
You can steer the number of displayed words to inspect the remaining weights:
show_topics(num_topics=4, num_words=10, log=False, formatted=True)
and change num_words
parameter to include even more words.
Now, there is also a possibility that:
minimum_probability
smaller (what is the value you use?),passes
larger,chunksize
smaller,I encourage you to experiment with different values of these parameters to check if any of the combination works better.