Search code examples
rrcurl

How can correctly parse a byte stream in R?


I am accessing an API which returns a long series of raw bytes.

My Q doesn't lend itself to an easy reprex of the API itself, but here is my best shot:

raw_bytes <-
 as.raw(c("0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x43","0xb7","0x01","0x48","0x3f","0x80","0x00","0x00","0x00","0x00","0x01","0x5e","0xa9","0x3e","0x83","0x80"))

   > str(raw_bytes)
     raw [1:28] 43 b7 01 48 ...

Now, from the API documentation, I know that this 28 byte chunk is to be parsed as follows, with "big" endian-ness:

bytes type

4 float

4 float

4 float

4 float

4 float

8 Long integer (this is to be a date object, def as milliseconds from Jan 1, 1970)

writeBin(raw_bytes, "myfile.txt")

con <- file("myfile.txt", "rb") # create connection object; specify raw binary

> readBin(con, "double", size = 4, n = 5, endian = "big") # get those first 5 objects from the chunk
[1] 366.00 366.00 365.75 366.00  10.70

So far so good; these are consistent with what I would expect.

> readBin(con, "integer", size = 8, n = 1, endian = "big") # get the last 8 byte chunk
[1] -1453180896

Hmmmm that looks wrong. An online 8 byte hex converter suggests the correct decimal value to be 1506080340000, which matches the date I would expect (Sept 22, 2017)

Taking a closer look at those last 8 bytes:

> (con2 <- tail(raw_bytes, 8))

[1] 00 00 01 5e a9 62 38 20

And trying a few different stabs at readBin():

> readBin(con2, "double", size = 8, n = 1, endian = "big")
[1] 7.441026e-312

> readBin(con2, "numeric", size = 8, n = 1, endian = "little")
[1] 1.818746e-153

> readBin(con2, "integer", size = 8, n = 1, endian = "little")
[1] 1577123840

Nope.

I can produce the expected decimal number from these bytes using an outside libary:

str <- paste(con2, collapse = "")

> bit64::as.integer64(as.numeric(paste0("0x",str)))
integer64
[1] 1506080340000

Anyway, here's my question: is there a way to properly parse my bitstream using base R, particularly readBin()?

And, more generally, is there an opinionated way about how to parse a streaming stream of bytes in an R session?


Solution

  • There is an answer to a similar question that you could use: reading unsigned integer 64 bit from binary file. It actually also tries to read a date.

    A more hacky answer is this:

    library( bit64 )
    con <- file("myfile.txt", "rb")
    readBin(con, "double", size = 4, n = 5, endian = "big")
    a = readBin(con, "double", size = 8, n = 1, endian = "big")
    class(a) = "integer64"
    a
    # 1506078000000
    

    Yuck! Or:

    library( bit64 )
    con <- file("myfile.txt", "rb")
    readBin(con, "double", size = 4, n = 5, endian = "big")
    sum( as.integer64( readBin(con,"integer",size=2,n=4,endian="big",signed=F) ) * 
         as.integer64(65536)^(3:0) )