Search code examples
rhttrrcurlraw

Can you convert an R raw vector representing an RDS file back into an R object without a round trip to disk?


I have an RDS file that is uploaded and then download via curl::curl_fetch_memory() (via httr) - this gives me a raw vector in R.

Is there a way to read that raw vector representing the RDS file to return the original R object? Or does it always have to be written to disk first?

I have a setup similar to below:

saveRDS(mtcars, file = "obj.rds")
# upload the obj.rds file 
...
# download it again via httr::write_memory()
...

obj
#   [1] 1f 8b 08 00 00 00 00 00 00 03 ad 56 4f 4c 1c 55 18 1f ca 02 bb ec b2 5d 
# ...
is.raw(obj)
#[1] TRUE

It seems readRDS() should be used to uncompress it, but it takes a connection object and I don't know how to make a connection object from an R raw vector - rawConnection() looked promising but gave:

rawConnection(obj)
#A connection with                           
#description "obj"          
#class       "rawConnection"
#mode        "r"            
#text        "binary"       
#opened      "opened"       
#can read    "yes"          
#can write   "no"     
readRDS(rawConnection(obj))
#Error in readRDS(rawConnection(obj)) : unknown input format

Looking through readRDS it looks like it uses gzlib() underneath but couldn't get that to work with the raw vector object.

If its download via httr::write_disk() -> curl::curl_fetch_disk() -> readRDS() then its all good but this is a round trip to disk and I wondered if it could be optimised for big files.


Solution

  • By default, RDS file streams are gzipped. To read a raw connection you need to manually wrap it into a gzcon:

    con = rawConnection(obj)
    result = readRDS(gzcon(con))
    

    This works even when the stream isn’t gzipped. But unfortunately it fails if a different supported compression method (e.g. 'bzip2') was used to create the RDS file. Unfortunately R doesn’t seem to have a gzcon equivalent for bzip2 or xz. For those formats, you can manually decompress the data and use unserialize instead of readRDS:

    result = unserialize(memDecompress(obj))
    

    This works for any data produced by saveRDS. It might fail for RDS objects manually created via memCompress(serialize(…, NULL)), because memCompress is not guaranteed to write a complete compression header that allows detecting the compression method.