Search code examples
rtmtopic-modeling

How to extract terms and probabilities from tmResult$terms in topic modeling?


I like to create separate word clouds for each of my 8 topics in an LDA model. I extracted top 40 words across 8 topics - an object of length 320 containing top words and occurrence probabilities. I am struggling with accessing the terms and probabilities from my top_words_vector object. It is hard to reproduce bc of the tmResult object, but any hint would be much appreciated:

    textdata <- base::readRDS(url("https://slcladal.github.io/data/sotu_paragraphs.rda", "rb"))

english_stopwords <- readLines("https://slcladal.github.io/resources/stopwords_en.txt", encoding = "UTF-8")

corpus <- Corpus(DataframeSource(textdata))

processedCorpus <- tm_map(corpus, content_transformer(tolower))
processedCorpus <- tm_map(processedCorpus, removeWords, english_stopwords)
processedCorpus <- tm_map(processedCorpus, removePunctuation, preserve_intra_word_dashes = TRUE)
processedCorpus <- tm_map(processedCorpus, removeNumbers)
processedCorpus <- tm_map(processedCorpus, stemDocument, language = "en")
processedCorpus <- tm_map(processedCorpus, stripWhitespace)

minimumFrequency <- 5
DTM <- DocumentTermMatrix(processedCorpus, control = list(bounds = list(global = c(minimumFrequency, Inf))))
sel_idx <- slam::row_sums(DTM) > 0
DTM <- DTM[sel_idx, ]
textdata <- textdata[sel_idx, ]

K <- 8
set.seed(9161)
# compute the LDA model, inference via 100 iterations of Gibbs sampling
topicModel <- LDA(DTM, K, method="Gibbs", control=list(iter = 100, verbose = 25))
tmResult <- topicmodels::posterior(topicModel)
tmResult$terms 

top_words_vector = c() # an empty container for 320 length, top#40 words across 8 topics
for(i in 1:8){
  top_words_vector = c(top_words_vector,sort(tmResult$terms[i,], decreasing=TRUE)[1:40])
}

top_words_vector

wordcloud() takes terms and probs separately, that's what I am trying to extract from top_words_vector:

mycolors <- brewer.pal(8, "Dark2")
wordcloud(c("apple", "banana"), c(0.8,0.2), random.order = TRUE, color = mycolors)

Solution

  • names(top_words_vector) accesses the names of the stored values.

    library(tm)
    library(topicmodels)
    library(RColorBrewer)
    library(wordcloud)
    
    
    mycolors <- brewer.pal(8, "Dark2")
    wordcloud(names(top_words_vector), top_words_vector, random.order = TRUE, color = mycolors)