This comes from r remove sparse terms by type of documents. Now I have two TermDocumentMatrix to remove sparse terms, I have tried this but it doesn't work. Any ideas?
library(tm)
library(Rstem)
data(crude)
spl <- runif(length(crude)) < 0.7
crude_1 <- crude[spl]
crude_2 <- crude[!spl]
controls <- list(
tolower = TRUE,
removePunctuation = TRUE,
stopwords = stopwords("english"),
stemming = function(word) wordStem(word, language = "english")
)
tdm_1 <- TermDocumentMatrix(crude_1, controls)
tdm_2 <- TermDocumentMatrix(crude_2, controls)
## Don´t work.
for(i in 1:2){
assign(paste0("TDM_", i),
removeSparseTerms(paste0('tdm_', i), 0.98)
}
## But this is ok.
removeSparseTerms(tdm_1, 0.98)
Thanks again!
This seems to work:
for(i in 1:2){
assign(paste0("TDM_", i),
removeSparseTerms(get(paste0('tdm_', i)), 0.98))
}
TDM_1
# <<TermDocumentMatrix (terms: 707, documents: 16)>>
# Non-/sparse entries: 1245/10067
# Sparsity : 89%
# Maximal term length: 13
# Weighting : term frequency (tf)
TDM_2
# <<TermDocumentMatrix (terms: 308, documents: 4)>>
# Non-/sparse entries: 377/855
# Sparsity : 69%
# Maximal term length: 16
# Weighting : term frequency (tf)