Search code examples
rtextquanteda

Reading docvars from filenames with Quanteda


The documentation for quanteda says that this is the way to import text files from a folder and to read metadata from the filenames:

require(readtext)
mytf5 <- readtext("directory/*.txt",docvarsfrom="filenames", sep="-", docvarnames=c("Year", "President"))

I have these files in the directory:

[1] "1866-marx.txt"     "1910-weber.txt"    "1958-williams.txt"
[4] "1982-bell.txt"     "1998-lindgren.txt"

When using the code above, I get:

Error in file(f, ...) : unused argument (sep = "-")

This is in spite of me having "-" separators in the filenames.


Solution

  • You're technically using the readtext package, not quanteda, and while we are about to submit readtext to CRAN, it's still only on GitHub. I'm not sure which version you are using, but you have the syntax wrong. Here is the correct call:

    > packageVersion("readtext")
    [1] ‘0.3’
    
    ?readtext::readtext
    
    readtext(file, ignore_missing_files = FALSE, textfield = NULL,
      docvarsfrom = c("metadata", "filenames", "filepaths"), dvsep = "_",
      docvarnames = NULL, encoding = NULL,
      verbosity = getOption("readtext_verbosity"), ...)
    

    So: the command you need is:

    require(readtext)
    mytf5 <- readtext("directory/*.txt", docvarsfrom = "filenames", dvsep="-",
                      docvarnames = c("Year", "President"))