I am trying to build a file in S3 using write_delim and I wanted it enclosed with double quotes(") however I don't know if it is not a parameter in write_delim function and I will need to use a base R function or if I am doing it incorrectly. Here is what I tried
s3write_using(file_filtered,
FUN = write_delim,
delim = ",",
na = "",
object = paste0(output_path,
"file-",
lubridate::today(),
".csv"),
bucket = input_bucket)
s3write_using(file_filtered,
FUN = write_delim,
delim = ",",
na = "",
quote = "double",
object = paste0(output_path,
"file-",
lubridate::today(),
".csv"),
bucket = input_bucket)
If I understand you correctly, you want to write a csv to your S3 bucket that includes one quote at the start and one quote at the end.
From the s3write_using documents:
FUN: For s3write_using, a function to which x and a file path will be passed (in that order).
So all you need is to define a function that will take an R object as its first argument and write a quote-enclosed csv string to the path passed as the second argument.
If you are really worried about optimization issues, readr::write_delim
is certainly faster than write.csv
, but the data.table library has an even faster function, fwrite
, which allows quoting in the same way as write.csv
write_quoted_csv <- function(object, path)
{
data.table::setDT(object)
data.table::fwrite(object, path, quote = TRUE)
data.table::setDF(object)
}
Let's test it against write_delim
using a dataframe with 100,000 rows:
df <- data.frame(a = 1:50000,
b = 50001:100000,
c = rep(LETTERS[1:10], each = 5000))
microbenchmark::microbenchmark(
readr = readr::write_delim(df, "~/test_readr.csv", delim = ",", na = ""),
data.table = write_quoted_csv(df, "~/test_datatable.csv"),
times = 100)
# Unit: milliseconds
# expr min lq mean median uq max neval
# readr 244.87593 257.6236 276.91877 262.86998 283.07285 416.79254 100
# data.table 20.80768 22.8940 26.25808 24.92915 27.69624 54.55789 100
You can see that the data.table method is over 10 times faster. Even then, write_delim
doesn't put the quotes in, whereas fwrite
does:
cat(readLines("~/test_readr.csv", 10), sep = "\n")
#> a,b,c
#> 1,50001,A
#> 2,50002,A
#> 3,50003,A
#> 4,50004,A
#> 5,50005,A
#> 6,50006,A
#> 7,50007,A
#> 8,50008,A
#> 9,50009,A
cat(readLines("~/test_datatable.csv", 10), sep = "\n")
#> "a","b","c"
#> 1,50001,"A"
#> 2,50002,"A"
#> 3,50003,"A"
#> 4,50004,"A"
#> 5,50005,"A"
#> 6,50006,"A"
#> 7,50007,"A"
#> 8,50008,"A"
#> 9,50009,"A"
So, with a super-fast method to pass, you can write your s3 file this way:
s3write_using(file_filtered,
FUN = write_quoted_csv,
object = paste0(output_path, "file-", lubridate::today(), ".csv"),
bucket = input_bucket)