Search code examples
cendianness

Is there an architecture-independent method to create a little-endian byte stream from a value in C?


I am trying to transmit values between architectures, by creating a uint8_t[] buffer and then sending that. To ensure they are transmitted correctly, the spec is to convert all values to little-endian as they go into the buffer.

I read this article here which discussed how to convert from one endianness to the other, and here where it discusses how to check the endianness of the system.

I am curious if there is a method to read bytes from a uint64 or other value in little-endian order regardless of whether the system is big or little? (ie through some sequence of bitwise operations)

Or is the only method to first check the endianness of the system, and then if big explicitly convert to little?


Solution

  • You can always serialize an uint64_t value to array of uint8_t in little endian order as simply

    uint64_t source = ...;
    uint8_t target[8];
    
    target[0] = source;
    target[1] = source >> 8;
    target[2] = source >> 16;
    target[3] = source >> 24;
    target[4] = source >> 32;
    target[5] = source >> 40;
    target[6] = source >> 48;
    target[7] = source >> 56;
    

    or

    for (int i = 0; i < sizeof (uint64_t); i++) {
        target[i] = source >> i * 8;
    }
    

    and this will work anywhere where uint64_t and uint8_t exists.

    Notice that this assumes that the source value is unsigned. Bit-shifting negative signed values will cause all sorts of headaches and you just don't want to do that.


    Deserialization is a bit more complex if reading byte at a time in order:

    uint8_t source[8] = ...;
    uint64_t target = 0;
    
    for (int i = 0; i < sizeof (uint64_t); i ++) {
        target |= (uint64_t)source[i] << i * 8;
    }
    

    The cast to (uint64_t) is absolutely necessary, because the operands of << will undergo integer promotions, and uint8_t would always be converted to a signed int - and "funny" things will happen when you shift a set bit into the sign bit of a signed int.


    If you write this into a function

    #include <inttypes.h>
    
    void serialize(uint64_t source, uint8_t *target) {
        target[0] = source;
        target[1] = source >> 8;
        target[2] = source >> 16;
        target[3] = source >> 24;
        target[4] = source >> 32;
        target[5] = source >> 40;
        target[6] = source >> 48;
        target[7] = source >> 56;
    }
    

    and compile for x86-64 using GCC 11 and -O3, the function will be compiled to

    serialize:
            movq    %rdi, (%rsi)
            ret
    

    which just moves the 64-bit value of source into target array as is. If you reverse the indices (7 ... 0; big-endian), GCC will be clever enough to recognize that too and will compile it (with -O3) to

    serialize:
            bswap   %rdi
            movq    %rdi, (%rsi)
            ret