Search code examples
rldatopicmodels

LDA with topicmodels package for R, how do I get the topic probability for each term?


I'm using the topicmodels package for LDA. I would like to create a visualization that shows how related or non-related each topic is. I envision a cluster of words that are unique to topic 1, but with a few keywords that are shared connecting to another topic. Any advice here would be great. To continue:

To do this, I need to know the each term probability to each topic. How do I get this with the topicmodels package? I can view the terms with:

terms(LDAmodel, 15)

But I don't know how to get values. Ideas?


Solution

  • You can use posterior()$terms to get the posterior probability for each term. posterior()$topics gives the probability for documents.

    Example adapted from help(LDA):

    data("AssociatedPress", package = "topicmodels")
    lda <- LDA(AssociatedPress[1:20,], k = 2)
    terms <- posterior(lda)$terms
    
    ## posterior probability for the first 5 terms (alphabetically)
    terms[,1:5]
             aaron      abandon    abandoned   abandoning       abbott
    1 3.720076e-44 3.720076e-44 3.720076e-44 3.720076e-44 3.720076e-44
    2 3.720076e-44 3.720076e-44 3.720076e-44 3.720076e-44 3.720076e-44