Start with two corpora of documents, each with the same number of documents:
library(tm)
c1 <- Corpus(VectorSource(c("document 1 corpus 1 text", "document 2 corpus 1 text")))
c2 <- Corpus(VectorSource(c("document 1 corpus 2 text", "document 2 corpus 2 text")))
I want a single corpus of the same number of documents with the terms combined element-wise to form a single document, the equivalent of:
c3 <- Corpus(VectorSource(c("document 1 corpus 1 text document 1 corpus 2 text",
"document 2 corpus 1 text document 2 corpus 2 text"))
Searching has turned up the tm_combine
function, but that combines the documents from different corpora into a single corpus with twice the (or, the sum of the individual) number of documents.
You can loop through each corpus and paste corresponding entries together. Then, convert back into a corpus:
Corpus(VectorSource(
mapply(function(x, y) paste(content(x), content(y)), c1, c2)
))