Search code examples
nlpgensimlda

How to map topic to a document after topic modeling is done with LDA?


Is there any way I can map generated topic from LDA to the list of documents and identify to which topic it belongs to ? I am interested in clustering documents using unsupervised learning and segregating it into appropriate cluster.

Example, I have 10 topics after running LDA model with the best hyperparameter. So, it should return a number of Topic is already defined withe pre-trained LDA model with new sentence or document that user input.

I am waiting you guys good solution. :)

Ps. I am using Gensim for NLP.


Solution

  • Using Quanteda You can achieve this as follows

    dtm <- convert(dfmat_news, to = "topicmodels")
    lda <- LDA(dtm, k = 10). #10 topics in this case
    

    Then you can obtain the most likely topics using the command topics() and save them as a document-level variable.

    docvars(dfmat_news, 'topic') <- topics(lda)
    head(topics(lda), 20)    
    

    here the tutorial : https://tutorials.quanteda.io/machine-learning/topicmodel/

    hope it is clear and useful :)