I want to display the entire (or partial) textual content of 400+ documents I have in a Corpus. To do so I've used the function writeLines
but it doesn't return the actual text contained in the document, instead it returns this:
list(list(content = c("", ""), meta = list(author = character(0), atetimestamp = list(sec = 33.0082728862762, min = 22, hour = 12, mday = 5, mon = 8, year = 116, wday = 1, yday = 248, isdst = 0), description = character(0), heading......
This is how I've coded:
library(tm)
library(SnowballC)
#Partition each cell in Excel into separate document
textdata <- read.csv("C:/Users/biat/Documents/survey/openanswers.csv", header = FALSE)
require(tm)
doc <- Corpus(DataframeSource(textdata), readerControl = list(language="swedish"))
writeLines(as.character(doc))
Does the problem lie in the R-code or in the CSV file? When I've used writeLines
together with DirSource
it returns the text. Anyone know how to suppress the info it returns above and how to retrieve only the text in the document?
try the following to have the text printed to your console, this is what you ask for if I understand well?
library(tm)
data("crude") # example set from tm
output <- sapply( crude, function(x) x$content) #get the content from your object
cat(output) # have your text outputted
ps: try and supply a reproducible example for your questions