Search code examples
rreadr

How to use readr::write_delim() to write a .csv enclosed


I am trying to build a file in S3 using write_delim and I wanted it enclosed with double quotes(") however I don't know if it is not a parameter in write_delim function and I will need to use a base R function or if I am doing it incorrectly. Here is what I tried

s3write_using(file_filtered,
              FUN = write_delim,
              delim = ",",
              na = "",
              object = paste0(output_path,
                              "file-",
                              lubridate::today(),
                              ".csv"),
              bucket = input_bucket)

s3write_using(file_filtered,
              FUN = write_delim,
              delim = ",",
              na = "",
              quote = "double",
              object = paste0(output_path,
                              "file-",
                              lubridate::today(),
                              ".csv"),
              bucket = input_bucket)




Solution

  • If I understand you correctly, you want to write a csv to your S3 bucket that includes one quote at the start and one quote at the end.

    From the s3write_using documents:

    FUN: For s3write_using, a function to which x and a file path will be passed (in that order).

    So all you need is to define a function that will take an R object as its first argument and write a quote-enclosed csv string to the path passed as the second argument.

    If you are really worried about optimization issues, readr::write_delim is certainly faster than write.csv, but the data.table library has an even faster function, fwrite, which allows quoting in the same way as write.csv

    write_quoted_csv <- function(object, path)
    {
      data.table::setDT(object)
      data.table::fwrite(object, path, quote = TRUE)
      data.table::setDF(object)
    }
    

    Let's test it against write_delim using a dataframe with 100,000 rows:

    df <- data.frame(a = 1:50000, 
                     b = 50001:100000, 
                     c = rep(LETTERS[1:10], each = 5000))
    
    microbenchmark::microbenchmark(
      readr      = readr::write_delim(df, "~/test_readr.csv", delim = ",", na = ""),
      data.table = write_quoted_csv(df, "~/test_datatable.csv"), 
      times      = 100)
    # Unit: milliseconds
    #        expr       min       lq      mean    median        uq       max neval
    #       readr 244.87593 257.6236 276.91877 262.86998 283.07285 416.79254   100
    #  data.table  20.80768  22.8940  26.25808  24.92915  27.69624  54.55789   100
    

    You can see that the data.table method is over 10 times faster. Even then, write_delim doesn't put the quotes in, whereas fwrite does:

    cat(readLines("~/test_readr.csv", 10), sep = "\n")
    #> a,b,c
    #> 1,50001,A
    #> 2,50002,A
    #> 3,50003,A
    #> 4,50004,A
    #> 5,50005,A
    #> 6,50006,A
    #> 7,50007,A
    #> 8,50008,A
    #> 9,50009,A
    cat(readLines("~/test_datatable.csv", 10), sep = "\n")
    #> "a","b","c"
    #> 1,50001,"A"
    #> 2,50002,"A"
    #> 3,50003,"A"
    #> 4,50004,"A"
    #> 5,50005,"A"
    #> 6,50006,"A"
    #> 7,50007,"A"
    #> 8,50008,"A"
    #> 9,50009,"A"
    

    So, with a super-fast method to pass, you can write your s3 file this way:

    s3write_using(file_filtered,
                  FUN = write_quoted_csv,
                  object = paste0(output_path, "file-", lubridate::today(), ".csv"),
                  bucket = input_bucket)