Read binary data with header from C in python

I have worked with C writing a file in binary format. The format I have used is the following:

A header with 5 doubles (a total of 40 bytes):

fwrite(&FirstNum, sizeof(double), 1, outFile);
fwrite(&SecNum, sizeof(double), 1, outFile);
fwrite(&ThirdNum, sizeof(double), 1, outFile);
fwrite(&FourthNum, sizeof(double), 1, outFile);           
fwrite(&FifthNum, sizeof(double), 1, outFile);

And then I performed a for cicle over 256^3 "particles". For each particle I write 9 values: the first one is an integer and the other 8 are doubles, in the following way:

Ntot = 256*256*256
for(i=0; i<Ntot; i++ )
  {
    fwrite(&gp[i].GID, sizeof(int), 1, outFile);

    /*----- Positions -----*/
    pos_aux[X] = gp[i].pos[X];
    pos_aux[Y] = gp[i].pos[Y];
    pos_aux[Z] = gp[i].pos[Z];

    fwrite(&pos_aux[0], sizeof(double), 3, outFile);  //Positions in 3D
    fwrite(&gp[i].DenConCell, sizeof(double), 1, outFile); //Density
    fwrite(&gp[i].poten_r[0], sizeof(double), 1, outFile); //Field 1
    fwrite(&gp[i].potDot_r[0], sizeof(double), 1, outFile); //Field 2
    fwrite(&gp[i].potDot_app1[0], sizeof(double), 1, outFile); //Field 3
    fwrite(&gp[i].potDot_app2[0], sizeof(double), 1, outFile); //Field 4
  }

Where gp is just a data structure containing the information of my particles. Then, for each of the 256^3 particles I have used a total of 68 bytes: 4 bytes for the int + 8*(8 bytes) for the doubles.

What I need is to read such format but in python in order to make some plots, but I'm a little new with python. I have read some of the answers to read files in binary format with python, but I have only been able to read my header, not the "body" or the rest of the information concerning the particles. What I have tried is the following:

Npart = 256
with open("./path/to/my/binary/file.bin", 'rb') as bdata:
    header_size = 40 # in bytes           
    bheader = bdata.read(40)
    header_data = struct.unpack('ddddd', bheader)
    FirstNum = header_data[0]
    SecNum = header_data[1]
    ThirdNum = header_data[2]
    FourthNum = header_data[3]
    FifthNum = header_data[4]
    #Until here, if I print each number, I obtain the correct values.
    #From here, is what I've tried in order to read the 9 data of the 
    #particles
    bytes_per_part = 68
    body_size = int( (Npart**3) * bytes_per_part )
    body_data_read = bdata.read(body_size)
    #body_data = struct.unpack_from('idddddddd', bdata, offset=40)
    #body_data = struct.unpack('=i 8d', body_data_read) 
    body_data = struct.unpack('<i 8d', body_data_read)

    #+++++ Unpacking data ++++++ 
    ID_us = body_data[0]
    pos_x_us = body_data[1]
    pos_y_us = body_data[2]
    pos_z_us = body_data[3]
    DenCon_us = body_data[4]

But when I run my code, I obtain this error:

body_data = struct.unpack('<i 8d', body_data_read)
struct.error: unpack requires a string argument of length 68

I have tried with the first commented line:

#body_data = struct.unpack_from('idddddddd', bdata, offset=40)

But the error says:

struct.error: unpack requires a string argument of length 72

If I use the line

    body_data = struct.unpack('=i 8d', body_data_read)

or the line

    body_data = struct.unpack('<i 8d', body_data_read)

I obtain the error I showed first:

struct.error: unpack requires a string argument of length 68

Indeed, I feel like I don't understand at all the string characters "=" and "<", because with them I obtain the supposed length I need to read, but I cannot read. What I finally need is an array called pos_x_us with all the positions in x, in pos_y_us the positions in y, in pos_z_us the positions in z and so on for the other values. I will be thanked if you can give me some ideas or enlightenment about how to obtain what I need.

Solution

Your problem arose because the buffer size didn't match format. Let's try it with some random data. 12 bytes overall, intended for an int and a float.

>>> data = '\xf4\x9f\x97\xcd\xf2\xbe\xd6\x87\x18\xe3\x17\xdf'

If you don't use '<', '>', '=', and '!', there will be padding.

Padding is only automatically added between successive structure members. No padding is added at the beginning or the end of the encoded struct.

>>> struct.unpack('id', data)

Traceback (most recent call last):
  File "<pyshell#56>", line 1, in <module>
    struct.unpack('id', data)
error: unpack requires a string argument of length 16

But

>>> struct.unpack('=id', data)
(-845701132, -1.2217466572589222e+150)

To be more specific, 'd' takes 8 bytes on its own and 'i' takes 4. 'iii' is fine as 12 on its own because it's the same type. But if you try to do 'id', it won't like that and it'll pad the integer to 8 bytes. You can see the same with 'c' taking up 1 byte, but 'ci' requiring 8. Basically, struct.unpack('ddddd') worked fine because of circumstances.

Your other error comes from the format not matching the size of the buffer. If you use struct.unpack(), it must match exactly, but if you use struct.unpack_from(), you must have at least the size of the format. Let's try with 24 bytes of data.

# this will fetch 12 bytes, even if the stream has more
>>> struct.unpack_from('=id', 2*data)
(-845701132, -1.2217466572589222e+150)

But

>>> struct.unpack('=id', 2*data)

Traceback (most recent call last):
  File "<pyshell#60>", line 1, in <module>
    struct.unpack('=id', 2*data)
error: unpack requires a string argument of length 12

As you can see now, your data was actually

body_size = int( (Npart**3) * bytes_per_part )
body_data_read = bdata.read(body_size)

In order to match that, you needed a format of 'i8di8di8d...' Npart**3 times. So,

body_data = struct.unpack('='+(Npart**3)*'i8d', body_data_read)

Now you have read in all the data at once and you can start splitting them as you desire. For example, the second value has the x coordinate of the first particle and since this pattern repeats every 9 values, you can get the x coordinates of all particles with slicing.

pos_x_us = body_data[1::9]