Search code examples
rcsvescapingdouble-quotes

How to read \" double-quote escaped values with read.table in R


I am having trouble to read a file containing lines like the one below in R.

"_:b5507F4C7x59005","Fabiana D\"atri"

Any idea? How can I make read.table understand that \" is the escape of quote?

Cheers, Alexandre


Solution

  • It seems to me that read.table/read.csv cannot handle escaped quotes.

    ...But I think I have an (ugly) work-around inspired by @nullglob;

    • First read the file WITHOUT a quote character. (This won't handle embedded , as @Ben Bolker noted)
    • Then go though the string columns and remove the quotes:

    The test file looks like this (I added a non-string column for good measure):

    13,"foo","Fab D\"atri","bar"
    21,"foo2","Fab D\"atri2","bar2"
    

    And here is the code:

    # Generate test file
    writeLines(c("13,\"foo\",\"Fab D\\\"atri\",\"bar\"",
                 "21,\"foo2\",\"Fab D\\\"atri2\",\"bar2\"" ), "foo.txt")
    
    # Read ignoring quotes
    tbl <- read.table("foo.txt", as.is=TRUE, quote='', sep=',', header=FALSE, row.names=NULL)
    
    # Go through and cleanup    
    for (i in seq_len(NCOL(tbl))) {
        if (is.character(tbl[[i]])) {
            x <- tbl[[i]]
            x <- substr(x, 2, nchar(x)-1) # Remove surrounding quotes
            tbl[[i]] <- gsub('\\\\"', '"', x) # Unescape quotes
        }
    }
    

    The output is then correct:

    > tbl
      V1   V2          V3   V4
    1 13  foo  Fab D"atri  bar
    2 21 foo2 Fab D"atri2 bar2