Search code examples
c++castingbinaryfilesendianness

Read binary data to long int


I need to read binary data which contain a column of numbers (time tags) and use 8bytes to record each number. I know that they are recorded in little endian order. If read correctly they should be decoded as (example)

  ...  
  2147426467  
  2147426635  
  2147512936  
  ...

I recognize that the above numbers are on the 2^31 -1 threshold. I try to read the data and invert the endiandness with: (length is the total number of bytes and buffer is pointer to an array that contains the bytes)

unsigned long int tag;
//uint64_t tag;    
for (int j=0; j<length; j=j+8) //read the whole file in 8-byte blocks
   { tag = 0;  
     for (int i=0; i<=7; i++) //read each block ,byte by byte
        {tag ^=  ((unsigned char)buffer[j+i])<<8*i ;} //shift each byte to invert endiandness and add them with ^=
   }
                                                                                              }

when run, the code gives:

  ...  
  2147426467  
  2147426635  
  18446744071562097256  
  similar big numbers   
  ...

The last number is not (2^64 - 1 - correct value). Same result using uint64_t tag. The code succeeds with declaring tag as

unsigned int tag;

but fails for tags greater than 2^32 -1. At least this makes sense.
I suppose I need some kind of casting on buffer[i+j] but I don't know how to do it.

(static_cast<uint64_t>(buffer[j+i])) 

also doesn't work.
I read a similar question but still need some help.


Solution

  • We assume that buffer[j+i] is a char, and that chars are signed on your platform. Casting to unsigned char converts buffer[j+i] into an unsigned type. However, when applying the << operator, the unsigned char value gets promoted to int so long as an int can hold all values representable by unsigned char.

    Your attempt to cast buffer[j+i] directly to uint64_t fails because if char is signed, the sign extension is still applied before the value is converted to the unsigned type.

    A double cast may work (that is, cast to unsigned char and then to unsigned long), but using an unsigned long variable to hold the intermediate value should make the intention of the code more clear. For me, the code would look like:

    decltype(tag) val = static_cast<unsigned char>(buffer[j+i]);
    tag ^= val << 8*i;