Search code examples
rword-cloud

Removing specific words from word cloud in R


I have made a word cloud in R for 2 songs. Now in the tdm when I display the items, i get the frequency of words for song 1 and song 2. I am also able to print the word cloud perfectly. My problem is i do not want words in tdm who frequency is less than 2. How can I do that.

I wrote the code and got this output:

tdm=TermDocumentMatrix(corpus)

> tdm=as.matrix(tdm)
>
> tdm
>

song 1  song 2
act                   0  2
action                0  2
actions               0  1
activity              5  4

I only want word activity as it occur more than once in both the songs. I mean I want to remove the words, act, action, actions. How can I do that ?


Solution

  • You didn't provide data some something like this should work:

    data("crude")
    tdm <- TermDocumentMatrix(crude)
    
    x <- as.matrix(tdm)[, 1:2]
    x[rowSums(apply(x, 2, ">", 1)) == 2, ]
    

    Explanation: The line x <- as.matrix(tdm)[, 1:2] just getting 2 columns like your data so it doesn't do anything but needed to make data that looked like yours since you didn't provide any. This line apply(x, 2, ">", 1) says give me logical values for the statement is this greater than 1. Then I wrap this with rowSums (logical values are TRUE=1 and FALSE=0). Values equal to 2 (I had > 1 before but this is sloppy) are the conditions you're looking for. The I use a logical index with this output x[GRAB_THE_ROWS, ]. You can tear each step apart and run the code for yourself as seen below:

    (step_1 <- apply(x, 2, ">", 1))
    (step_2 <- rowSums(step_1))
    (step_3 <- step_2 == 2)
    x[step_3, ]