Before I updated my version of RStudio, everything worked great. With the update something has changed with Document Term Matrix in the 'tm' package. I want to create a dtm, but with numbers. For instance if I have a .csv with one column as shown below:
x
1.01
11.21
123.35
212.11
I want the column names in my term matrix to look like this:
1.01 11.21 123.35 212.11
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
But instead it looks like this:
123 212
0 0
0 0
1 0
0 1
Here's the code that used to work:
corpus = Corpus(VectorSource(x))
dtm = DocumentTermMatrix(corpus)
dtm_df = as.data.frame(as.matrix(dtm))
Thanks in advance
From the 'tm' package maintainer Ingo Feinerer:
Here's the code that used to work:
corpus = Corpus(VectorSource(x))
Try VCorpus() instead of Corpus().
dtm = DocumentTermMatrix(corpus) dtm_df = as.data.frame(as.matrix(dtm))
That is highly inefficient (since as.matrix() generates a dense representation from the sparse term-document matrix).
Best regards, Ingo