Search code examples
rtm

DirSource returning empty directory error despite correct file path


This seems like a very basic issue. The file path is valid and I can open the file using other means in R, but I am looking to use tm library.

docs <- Corpus(DirSource("C:/Users/xyz/Work/test.corpus.txt"), encoding = "UTF-8"))

Throws an error of:

Error in inherits(x, "Source") : empty directory

EDIT:

This works with the original method:

docs <- Corpus(DirSource("C:/Users/xyz/Work/"), encoding = "UTF-8"))

Apparently you cannot specify an individual file name. The solution is to to read the file via another method and then use another source type such as VectorSource.


Solution

  • You can specify a pattern so that DirSource only picks the files with that pattern. pattern = ".txt" for all txt files. Or if you want, pattern = "test.corpus.txt". Something like below.

    docs <- Corpus(DirSource("C:/Users/xyz/Work/", pattern = "test.corpus.txt", encoding = "UTF-8")