I am using R on Windows 10 x64. I am trying to read a set of txt file into R to do text analysis. I am using the following code:
setwd(inputdir)
files <- DirSource(directory = inputdir, encoding ="UTF-8" )
docs<- VCorpus(x=files)
writeLines(as.character(docs[[2]]))
The last line is intended to show the content of the document #2, which this code shows as empty (as well as all other documents in the set). I am not sure why. I checked encoding of the txt document (open, then choose "save as") and my txt files encoding is "Unicode." When I save any of the files as "ANSI" manually, the writeLines(as.character(docs[[2]]))
gives me proper content. I thought I should convert all files to ANSI. In that regard, I wanted to ask how can I do that in R for all txt files in my "inputdir"?
get all txt file
files <- list.files(path=getwd(), pattern="*.txt", full.names=T, recursive=FALSE)
loop for converting the encoding and overwrite it
for(i in 1:length(files)){
input <- readLines(files[i])
converted_input <- iconv(input, from = file_encoding, to = file_encoding)
writeLines(converted_input,files[i])
}
possible encodings can be viewed by the iconvlist()
command