I want to find associations between only a few words, and the input for example,
{
data("crude")
tdm <- TermDocumentMatrix(crude)
findAssocs(tdm, c("oil", "opec", "xyz"), 0.1)
}
Here, I only want to find only the correlations between c("oil", "opec", "xyz")
and soln = c("was","are","were","am","is","been","being","be")
, and not the entire TDM.
How to achieve this?
I can extract these results after computation, but I would like it to be computationally efficient, and thus calculate the associations only for soln
vector as shown above, and not any other words.
I solved this by looking deeper into the implementation of findAssocs() function. It uses cor
internally to determine the associations. The solution for above problem thus is something like:
cd<- cor(termFreq1[j,], termFreq1[i,])