I have old binary files written in what was called the 'DEC' format. In order to get the correct value for a 4 byte floating point from this format I can do the following:
I thought there would be a endian option [c('little', 'big', 'swap')] in readBin() that would take care of this but this does not seem to be the case. Here is an example and some code that shows the current workaround.
# Start with actual value from sample file:
# 4 bytes representing target value of 1.290
# in practice dec_bytes is read in by readBin(con, raw(), n=4)
dec_bytes <- writeBin(1.290, raw(), size=4)
# Now rearrange bytes swapping words
pc_bytes <- c(dec_bytes[3], dec_bytes[4], dec_bytes[1], dec_bytes[2])
# Now use readBin to give numeric value of bytes
pc_float <- readBin(pc_bytes, numeric(), n=1, size=4)
pc_float
# [1] 0.5161456
# Now divide by 4 to get the correct answer
pc_float <- pc_float / 4
pc_float
#[1] 0.1290364
I can obviously create a function to do this as listed above but, the actual the question is: Is there an easier and efficient way to do this? In some C code I either wrote or found about 30 years ago, I used the following function which I can only assume actually worked:
float ConvertDecToFloat(char bytes[4])
{
char p[4];
p[0] = bytes[2];
p[1] = bytes[3];
p[2] = bytes[0];
p[3] = bytes[1];
if (p[0] || p[1] || p[2] || p[3])
--p[3]; // adjust exponent
return *(float*)p;
}
So the --p[3] subtracts 1 from the last byte after rearranging which results in the correct answer without having to divide by 4. Not sure if this can be done in R without conversion to integer and back to byte.
Answered by a colleague (thanks to Michael Schwartz). Simple vectorized solution is to create a vector of indicies used to reorganize the byte vector values. I have two working solutions:
# Test on a vector with 24 bytes, convert to 6 doubles of 4 bytes each
values <- c(1, 12, 123, 1234, 12345, 123456)
pc_bytes0 <- writeBin(values, raw(), size = 4)
# Need to shuffle the byte order to reproduce DEC order
# using same procedure we will use to unshuffle
# Swapping needed to convert from PC to DEC byte order
# DEC byte 1 -> 3, 2 -> 4, 3 >- 1, 4 -> 2
byte_adjust <- rep(c(2, 2, -2, -2), 6)
# Original index order
pc_byte_index <- seq(1:24) # original byte order
# New index order for DEC data storage, add adjustment vector
dec_byte_index <- pc_byte_index + byte_adjust
# Now reshuffle the original data using the index to get the DEC order
dec_bytes <- pc_data[dec_byte_index]
# This what readBin(raw()) will return from DEC file,
# so actual process starts here.
# Note: To get the true DEC byte array we would have to subtract 01
# from the 2nd byte in each 4 byte sequence
# Approach 1, make a long vector of original byte order and another of offsets
# and add together
# Data is in DEC sequence, so make vector of original order
dec_byte_index <- seq(1:24) # original byte index order
# These are the index offsets needed
byte_adjust <- rep(c(2, 2, -2, -2), 6)
# Offset original order by adding
pc_byte_index <- dec_byte_index + byte_adjust
# Apply PC byte order to data
pc_bytes <- dec_bytes[pc_byte_index]
# Now the data can by read in the correct order and correction applied
pc_float <- readBin(pc_bytes, double(), n=6, size=4)
pc_float
#> pc_float
#[1] 1 12 123 1234 12345 123456
# Approach 2, use single index, reshape to matrix and apply
# index representing desired order of 4 original bytes
byte_index <- c(3, 4, 1, 2)
# Convert data to matrix
dec_byte_matrix <- matrix(dec_bytes, nrow=4, ncol=6)
# Use indicies to swap
pc_bytes <- dec_byte_matrix[index, ]
# Now compute floats
pc_float <- readBin(pc_bytes, double(), n=6, size=4)
#> pc_float
#[1] 1 12 123 1234 12345 123456
I tested with microbench and there is no discernable difference in processing time between these two. Note that with original DEC data pc_float needs to be divided by 4 to get the correct answer unless the byte adjustment is done instead.