Search code examples
rnlptm

How to restrict findAssocs() to selected words?


I want to find associations between only a few words, and the input for example,

{
data("crude")
tdm <- TermDocumentMatrix(crude)
findAssocs(tdm, c("oil", "opec", "xyz"), 0.1)
 }

Here, I only want to find only the correlations between c("oil", "opec", "xyz") and soln = c("was","are","were","am","is","been","being","be") , and not the entire TDM.

How to achieve this?

I can extract these results after computation, but I would like it to be computationally efficient, and thus calculate the associations only for soln vector as shown above, and not any other words.


Solution

  • I solved this by looking deeper into the implementation of findAssocs() function. It uses cor internally to determine the associations. The solution for above problem thus is something like:

    cd<- cor(termFreq1[j,], termFreq1[i,])