Search code examples
rdata.tablefwritefreaddisk.frame

What's the best way to write a disk frame to CSV?


I'm looking through the docs and I don't see a function for writing to CSV.

It appears there's a function for writing the disk frame, but it's unclear what format it gets stored in

write_disk.frame

Write a data.frame/disk.frame to a disk.frame location. If df is a data.frame then using the as.disk.framefunction is recommended for most cases

Can I use fwrite or write_csv with a disk frame?


Solution

  • I see. I might add the write to csv functionality as I see this request quite often.

    The best way to keep track though is to submit an issue on github https://github.com/xiaodaigh/disk.frame/issues I have done that this time see https://github.com/xiaodaigh/disk.frame/issues/311

    If you want to write each chunk to a separate CSV just do

    df %>%
      cimap(function(id, chunk) {
        data.table::fwrite(chunk, file.path("some/path/", paste0(id, ".csv"))
        NULL # return null since you don't need to return anything
      }, lazy=FALSE)
    

    E.g.

    library(disk.frame)
    
    a = as.disk.frame(nycflights13::flights)
    
    cimap(a, function(chunk, id) {
      data.table::fwrite(chunk, file.path(tempdir(), paste0(id, ".csv")))
      NULL
    }, lazy=FALSE)
    
    
    dir(tempdir())
    

    If you wish to write to one file just modify to write to one file via append=TRUE, but make sure you turn off multiple workers!

    setup_disk.frame(workers = 1) # only one worker
    cmap(a, function(chunk) {
      data.table::fwrite(chunk, file.path(tempdir(), "one_file.csv"), append = TRUE)
      NULL
    }, lazy=FALSE)
    setup_disk.frame() # turn multi worker back on 
    
    
    dir(tempdir())