I am a new programmer for R. And I have some articles(.txt) saved in a folder. Now I can import articles in R. I have two methods and I don't know which one is much better.
Here is my code:
# 1
library(tm)
cname <- file.path("D:/magazine_pass")
docs <- Corpus(DirSource(cname), readerControl=list(reader=readPlain))
# 2
dir.list <- list.files("D:/magazine_pass" , full.name = TRUE)
for(i in 1:length(dir.list)){
file0 <- dir.list[i]
s <- readLines(file0,encoding="ASCII")
s <- sapply(s,function(row) iconv(row, "ASCII", "ASCII", sub=""))
}
And I am also trying to use some biokeywords(ex.clean energy,wearable device)
to find which articles contain these keywords.
How can I do with that?
Please show me the code and simply describe it. Thanks a lot.
label1 = subset(docs, grepl(paste(c("clean energy","wearable device"), collapse = "|"), docs))
This should look through your corpus and pull out any entries that contain the words inside the grepl function. The basic grep function searches files for a string pattern that matches the pattern provided. grepl returns a logical vector of TRUE/FALSE for whether patterns are matched within the function.