Search code examples
rtext-miningtmcorpus

Corpus object missing text


Working with 'tm' library in R.

When aplying this code:

abstract <- VectorSource(data$Abstract)

It works and gives this outcome:

[1] Accurate text...
[2] Accurate text...
[3] Accurate text...

Then I turn it into a Corpus object so I can work on it for applying some cluster analysis further on.

abstract <- tm::Corpus(tm::VectorSource(data$Abstract)) 

While checking the raw data, I found out that it saves the lines as NULL when turning it into a data frame with this:

dataframe <- data.frame(text=unlist(sapply(abstract, `[`, "content")), 
                        stringsAsFactors=F)
text
1   NA
2   NA
3   NA
4   NA
5   NA
6   NA
7   NA
8   NA
Showing 1 to 8 of 23,600 entries, 1 total columns

So I don't get how to particularly turn the text into Corpus.


Solution

  • I'll be answearing my own question with this:

    writeLines(as.character(abstract[[1]]))
    content(abstract[[1]])
    

    But still don't know how to get the full column as an outcome.