Search code examples
rmatrixtmterm-document-matrix

tm package: Output of findAssocs() in a matrix instead of a list in R


Consider the following list:

library(tm)
data("crude")
tdm <- TermDocumentMatrix(crude)
a <- findAssocs(tdm, c("oil", "opec", "xyz"), c(0.7, 0.75, 0.1))

How do I manage to have a data frame with all terms associated with these 3 words in the columns and showing:

  1. The corresponding correlation coefficient (if it exists)
  2. NA if it does not exists for this word (for example the couple (oil, they) would show NA)

Solution

  • Here's a solution using reshape2 to help reshape the data

    library(reshape2)
    aa<-do.call(rbind, Map(function(d, n) 
        cbind.data.frame(
          xterm=if (length(d)>0) names(d) else NA, 
          cor=if(length(d)>0) d else NA, 
          term=n),
        a, names(a))
    )
    
    dcast(aa, term~xterm, value.var="cor")