I wonder how can I get the term frequency weight in tm packge which is (tf=term/total terms in the document)`
MyMatrix <- DocumentTermMatrix(a, control = list(weight= weightTf))
After I use this weight it shows the frequency of term not TF weight like this
Doc(1) 1 0 0 3 0 0 2
Doc(2) 0 0 0 0 0 0 0
Doc(3) 0 5 0 0 0 0 1
Doc(4) 0 0 0 2 2 0 0
Doc(5) 0 4 0 0 0 0 1
Doc(6) 5 0 0 0 1 0 0
Doc(7) 0 5 0 0 0 0 0
Doc(8) 0 0 0 1 0 0 7
For example
library(tm)
corp <- Corpus(VectorSource(c(doc1="hello world", doc2="hello new world")))
myfun <- WeightFunction(function(m) {
cs <- slam::col_sums(m)
m$v <- m$v/cs[m$j]
return(m)
}, "Term Frequency by Total Document Term Frequency", "termbytot")
dtm <- DocumentTermMatrix(corp, control = list(weighting = myfun))
inspect(dtm)
# <<DocumentTermMatrix (documents: 2, terms: 3)>>
# Non-/sparse entries: 5/1
# Sparsity : 17%
# Maximal term length: 5
#
# Terms
# Docs hello new world
# 1 0.5000000 0.0000000 0.5000000
# 2 0.3333333 0.3333333 0.3333333