Search code examples
cfile32-bit

fetch 32bit instruction from binary file in C


I need to read 32bit instructions from a binary file. so what i have right now is:

unsigned char buffer[4];
fread(buffer,sizeof(buffer),1,file);

which will put 4 bytes in an array

how should I approach that to connect those 4 bytes together in order to process 32bit instruction later? Or should I even start in a different way and not use fread?

my weird method right now is to create an array of ints of size 32 and the fill it with bits from buffer array


Solution

  • The answer depends on how the 32-bit integer is stored in the binary file. (I'll assume that the integer is unsigned, because it really is an id, and use the type uint32_t from <stdint.h>.)

    Native byte order The data was written out as integer on this machine. Just read the integer with fread:

    uint32_t op;
    
    fread(&op, sizeof(op), 1, file);
    

    Rationale: fread read the raw representation of the integer into memory. The matching fwrite does the reverse: It writes the raw representation to thze file. If you don't need to exchange the file between platforms, this is a good method to store and read data.

    Little-endian byte order The data is stored as four bytes, least significant byte first:

    uint32_t op = 0u;
    
    op |= getc(file);            // 0x000000AA
    op |= getc(file) << 8;       // 0x0000BBaa
    op |= getc(file) << 16;      // 0x00CCbbaa
    op |= getc(file) << 24;      // 0xDDccbbaa
    

    Rationale: getc reads a char and returns an integer between 0 and 255. (The case where the stream runs out and getc returns the negative value EOF is not considered here for brevity, viz laziness.) Build your integer by shifting each byte you read by multiples of 8 and or them with the existing value. The comments sketch how it works. The capital letters are being read, the lower-case letters were already there. Zeros have not yet been assigned.

    Big-endian byte order The data is stored as four bytes, least significant byte last:

    uint32_t op = 0u;
    
    op |= getc(file) << 24;      // 0xAA000000
    op |= getc(file) << 16;      // 0xaaBB0000
    op |= getc(file) << 8;       // 0xaabbCC00
    op |= getc(file);            // 0xaabbccDD
    

    Rationale: Pretty much the same as above, only that you shift the bytes in another order.

    You can imagine little-endian and big-endian as writing the number one hundred and twenty tree (CXXIII) as either 321 or 123. The bit-shifting is similar to shifting decimal digtis when dividing by or multiplying with powers of 10, only that you shift my 8 bits to multiply with 2^8 = 256 here.