I used TermDocument
Matrix in R, and documents(strings) include single letter words also. After using TermDocument
Matrix, the terms do not include those single letter words, please suggest which control should I include as an input argument in order to include single letter word in my term document matrix.`
By default the min wordlength
is 3
. you need to specify the parameter as control
to override the default, check out the following code.
library(tm)
docs <- c("This is a text","When Will u start", "1 12 123")
corpus <- Corpus(VectorSource(docs))
as.matrix(DocumentTermMatrix(corpus)) #words with length < 3 ('a','u','1','12') are excluded
# Terms
#Docs 123 start text this when will
# 1 0 0 1 1 0 0
# 2 0 1 0 0 1 1
# 3 1 0 0 0 0 0
as.matrix(DocumentTermMatrix(corpus, control = list(wordLengths=c(1,Inf))))
# Terms
#Docs 1 12 123 a is start text this u when will
# 1 0 0 0 1 1 0 1 1 0 0 0
# 2 0 0 0 0 0 1 0 0 1 1 1
# 3 1 1 1 0 0 0 0 0 0 0 0