Search code examples
r

For saving a single large object in R, is saveRDS or save faster?


I currently have a very large array (500 elements, each with a 1000 by 20 matrix). I have been using saveRDS to save objects. However, it consistently takes a very long time to do so. I am wondering if save() is faster, or if there are options in each to save things faster? Thanks.


Solution

  • You can always spelunk a bit in the sources:

    saveRDS():

    function (object, file = "", ascii = FALSE, version = NULL, compress = TRUE, 
        refhook = NULL) {}
    ...
      .Internal(serializeToConn(object, con, ascii, version, refhook))
    }
    

    Eventually: https://github.com/wch/r-source/blob/2c3e0e757e81ca23c34da8dde4ff925bd9d275f0/src/main/serialize.c#L2471-L2536

    save():

    function (..., list = character(), file = stop("'file' must be specified"), 
        ascii = FALSE, version = NULL, envir = parent.frame(), compress = isTRUE(!ascii), 
        compression_level, eval.promises = TRUE, precheck = TRUE) {
    ...
      .Internal(saveToConn(list, con, ascii, version, envir,  eval.promises))
    }
    

    Eventually: https://github.com/wch/r-source/blob/6ac8f58c608337200f85ea47cba2abc717be6eb5/src/main/saveload.c#L1973-L2041

    OR

    give it a benchmark (List assuming it's a list of matrix objects):

    library(microbenchmark)
    
    set.seed(0)
    
    lapply(1:500, function(i) {
      matrix(sample(20*1000), nrow = 1000, ncol = 20)
    }) -> matrix_list
    
    print(str(matrix_list, list.len=5))
    ## List of 500
    ##  $ : int [1:1000, 1:20] 17934 5310 7442 11456 18161 4033 17963 18887 13211 12577 ...
    ##  $ : int [1:1000, 1:20] 2227 4212 2296 2907 6198 3005 10531 2358 9543 15374 ...
    ##  $ : int [1:1000, 1:20] 5969 11861 11057 11933 7852 17959 14794 530 16811 17003 ...
    ##  $ : int [1:1000, 1:20] 1073 14634 12948 16282 2087 6687 7992 7640 18482 8043 ...
    ##  $ : int [1:1000, 1:20] 10900 8249 6059 10767 15541 17139 11663 9010 576 14900 ...
    ##   [list output truncated]
    ## NULL
    
    pryr::object_size(matrix_list)
    ## 40.1 MB
    
    microbenchmark(
      save = save(matrix_list, file = "/tmp/out.rda"),
      saveRDS = saveRDS(matrix_list, file = "/tmp/out.rds"),
      times = 5,
      control = list(warmup = 2)
    ) -> mb
    
    mb
    ## Unit: seconds
    ##     expr      min       lq     mean   median       uq       max neval
    ##     save 8.571138 8.578461 8.747248 8.650629 8.665557  9.270453     5
    ##  saveRDS 8.647355 8.655231 9.298947 8.684998 8.772102 11.735052     5
    

    You can play with the compress & compression_level settings in save() and level in gzcon() for use in saveRDS() with compress to see if changing or removing compression helps.