Search code examples
rtm

r remove sparse terms from more than one tdm


This comes from r remove sparse terms by type of documents. Now I have two TermDocumentMatrix to remove sparse terms, I have tried this but it doesn't work. Any ideas?

library(tm)
library(Rstem)

data(crude)

spl <- runif(length(crude)) < 0.7
crude_1 <- crude[spl]
crude_2 <- crude[!spl]

controls <- list(
  tolower = TRUE,
  removePunctuation = TRUE,
  stopwords = stopwords("english"),
  stemming = function(word) wordStem(word, language = "english")
)

tdm_1 <- TermDocumentMatrix(crude_1, controls)
tdm_2 <- TermDocumentMatrix(crude_2, controls)

## Don´t work.

for(i in 1:2){
  assign(paste0("TDM_", i), 
  removeSparseTerms(paste0('tdm_', i), 0.98)
}

## But this is ok.

removeSparseTerms(tdm_1, 0.98)

Thanks again!


Solution

  • This seems to work:

    for(i in 1:2){
      assign(paste0("TDM_", i), 
             removeSparseTerms(get(paste0('tdm_', i)), 0.98))
    }
    TDM_1
    # <<TermDocumentMatrix (terms: 707, documents: 16)>>
    # Non-/sparse entries: 1245/10067
    # Sparsity           : 89%
    # Maximal term length: 13
    # Weighting          : term frequency (tf)
    TDM_2
    # <<TermDocumentMatrix (terms: 308, documents: 4)>>
    # Non-/sparse entries: 377/855
    # Sparsity           : 69%
    # Maximal term length: 16
    # Weighting          : term frequency (tf)