Search code examples
rdataframetype-conversiontf-idf

How to convert Data Frame to Term Document Matrix in R?


I have a table (Data frame) myTable with a single column as follows:

         sentence
1      it is a window
2      My name is john doe
3      Thank you
4      Good luck
.
.
.

I want to convert it to a Term Document Matrix in R. I did this:

tdm_s <- TermDocumentMatrix(Corpus(DataframeSource(myTable)))

but I got this error:

Error: all(!is.na(match(c("doc_id", "text"), names(x)))) is not TRUE

I googled and couldn't find anything. How can I do this conversion?


Solution

  • You would need to do as below to convert into Term Document Matrix:

    ## Your sample data
    myTable <- data.frame(sentence = c("it is a window", "My name is john doe", "Thank you", "Good luck"))
    
    ## You need to use VectorSource before using Corpus
    library(tm)
    myCorpus <- Corpus(VectorSource(myTable$sentence))
    tdm <- TermDocumentMatrix(myCorpus)
    
    inspect(tdm)
    #<<TermDocumentMatrix (terms: 8, documents: 4)>>
    #Non-/sparse entries: 8/24
    #Sparsity           : 75%
    #Maximal term length: 6
    #Weighting          : term frequency (tf)
    #Sample             :
    #         Docs
    #Terms   1 2 3 4
    #doe     0 1 0 0
    #good    0 0 0 1
    #john    0 1 0 0
    #luck    0 0 0 1
    #name    0 1 0 0
    #thank   0 0 1 0
    #window  1 0 0 0
    #you     0 0 1 0