I have a binary file in which data segments are interspersed. I know locations (byte offsets) of every data segment, as well as size of those data segments, as well as the type of data points (float, float32 - meaning that every data point is coded by 4 bytes). I want to read those data segments into an array like structure (for example, numpy array or pandas dataframe), but I have trouble doing so. I've tried using numpy's memmap, but it short circuits on the last data segment, and numpy's from file just gets me bizzare results.
Sample of the code:
begin=datadf["$BEGINDATA"][0] #datadf is pandas.df that has where data begins and its size
buf.seek(begin) #buf is the file that is opened in rb mode
size=datadf["$DATASIZE"][0]+1 #same as the above
data=buf.read(size) #this should get me that data segment, but in binary
Is there a way to reliably convert to float32 from this binary data. For further clarification, I'm including printout of first 10 data points.
buf.seek(begin)
print(buf.read(40)) #10 points of float32 (4bytes) means 40
>>>b'\xa5\x10[@\x00\x00\x88@a\xf3\xf7A\x00\x00\x88@&\x93\x9bA\x00\x00\x88@\x00\x00\x00@\xfc\xcd\x08?\x1c\xe2\xbe?\x03\xf9\xa4?'
If it's any value, while there are 4 bytes (32 bit width) for each float point, every float point is capped to maximum value of 10 000
If you want a numpy.ndarray
, you can just use numpy.frombuffer
>>> import numpy as np
>>> data = b'\xa5\x10[@\x00\x00\x88@a\xf3\xf7A\x00\x00\x88@&\x93\x9bA\x00\x00\x88@\x00\x00\x00@\xfc\xcd\x08?\x1c\xe2\xbe?\x03\xf9\xa4?'
>>> np.frombuffer(data, dtype=np.float32)
array([ 3.422891 , 4.25 , 30.993837 , 4.25 , 19.44685 ,
4.25 , 2. , 0.5343931, 1.4912753, 1.2888492],
dtype=float32)