I'm using RTextTools for the first time. Here's my code for create_matrix
library(RTextTools)
texts <- c("This is the first document.",
"Is this a text?",
"This is the second file.",
"This is the third text.",
"File is not this.")
doc_matrix <- create_matrix(texts, language="english", removeNumbers=FALSE, stemWords=TRUE, removeSparseTerms=.2)
I'm getting the following error(s):
Error in `[.simple_triplet_matrix`(matrix, , sort(colnames(matrix))) :
Invalid subscript type: NULL.
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(j) : is.na() applied to non-(list or vector) of type 'NULL'
I haven't seen anyone else post this error yet, and figure there's something very basic which I am missing.
Peter
You need to remove the final argument, removeSparseTerms=.2)
From the tm
package documentation on removeSparseTerms
: "A term-document matrix where those terms from x are removed which have at least a sparse percentage of empty (i.e., terms occurring 0 times in a document) elements. I.e., the resulting matrix contains only terms with a sparse factor of less than sparse."
I think the sparseness threshold is too low for your data set.