Search code examples
nlptext-miningtmcorpus

Add metadata to VectorSource corpus using 'tm' library in R


I have a csv file and I'm trying to convert it into Corpus to use the tm_map later and the apply some clustering.

I read the file

data <- read.csv("data.csv", header = TRUE, sep = ",",stringsAsFactors = FALSE)

Turn what I need into corpus

corp <- Corpus(VectorSource(data$text)) 

This is the outcome for the metadata

> meta(corp[[1]])
  author       : character(0)
  datetimestamp: 2019-09-20 20:48:45
  description  : character(0)
  heading      : character(0)
  id           : 1
  language     : en
  origin       : character(0)

Then I try to add the author info, so I can add the date and title afterwards, like this

> for(i in 1:length(corp)) {
+ corp[[i]]$meta$author == data$author[i]
+ }

but I keep on getting this

> abstract[[1]]$meta$author
character(0)
> meta(abstract[[1]], tag = 'author')
character(0)

when

> data$author[1]
[1] "Juan Vásquez Córdoba"

How can I add the right metadata info to my Corpus?


Solution

  • I found the answear, object corpus must be this way:

    corp <- VCorpus(VectorSource(data$text)) 
    

    With the V everything works out