Search code examples
pythonalgorithmcluster-analysislatent-semantic-indexing

Clustering using Latent Dirichlet Allocation algo in gensim


Is it possible to do clustering in gensim for a given set of inputs using LDA? How can I go about it?


Solution

  • LDA produces a lower dimensional representation of the documents in a corpus. To this low-d representation you could apply a clustering algorithm, e.g. k-means. Since each axis corresponds to a topic, a simpler approach would be assigning each document to the topic onto which its projection is largest.