Search code examples
cgccx86compiler-optimizationendianness

How to make GCC generate bswap instruction for big endian store without builtins?


Update: This was fixed in GCC 8.1.

I'm working on a function that stores a 64-bit value into memory in big endian format. I was hoping that I could write portable C99 code that works on both little and big endian platforms and have modern x86 compilers generate a bswap instruction automatically without any builtins or intrinsics. So I started with the following function:

#include <stdint.h>

void
encode_bigend_u64(uint64_t value, void *vdest) {
    uint8_t *bytes = (uint8_t *)vdest;
    bytes[0] = value >> 56;
    bytes[1] = value >> 48;
    bytes[2] = value >> 40;
    bytes[3] = value >> 32;
    bytes[4] = value >> 24;
    bytes[5] = value >> 16;
    bytes[6] = value >> 8;
    bytes[7] = value;
}

This works fine for clang which compiles this function to:

bswapq  %rdi
movq    %rdi, (%rsi)
retq

But GCC fails to detect the byte swap. I tried a couple of different approaches but they only made things worse. I know that GCC can detect byte swaps using bitwise-and, shift, and bitwise-or, but why doesn't it work when writing bytes?

Edit: I found the corresponding GCC bug.


Solution

  • This seems to do the trick:

    void encode_bigend_u64(uint64_t value, void* dest)
    {
      value =
          ((value & 0xFF00000000000000u) >> 56u) |
          ((value & 0x00FF000000000000u) >> 40u) |
          ((value & 0x0000FF0000000000u) >> 24u) |
          ((value & 0x000000FF00000000u) >>  8u) |
          ((value & 0x00000000FF000000u) <<  8u) |      
          ((value & 0x0000000000FF0000u) << 24u) |
          ((value & 0x000000000000FF00u) << 40u) |
          ((value & 0x00000000000000FFu) << 56u);
      memcpy(dest, &value, sizeof(uint64_t));
    }
    

    clang with -O3

    encode_bigend_u64(unsigned long, void*):
            bswapq  %rdi
            movq    %rdi, (%rsi)
            retq
    

    clang with -O3 -march=native

    encode_bigend_u64(unsigned long, void*):
            movbeq  %rdi, (%rsi)
            retq
    

    gcc with -O3

    encode_bigend_u64(unsigned long, void*):
            bswap   %rdi
            movq    %rdi, (%rsi)
            ret
    

    gcc with -O3 -march=native

    encode_bigend_u64(unsigned long, void*):
            movbe   %rdi, (%rsi)
            ret
    

    Tested with clang 3.8.0 and gcc 5.3.0 on http://gcc.godbolt.org/ (so I don't know exactly what processor is underneath (for the -march=native) but I strongly suspect a recent x86_64 processor)


    If you want a function which works for big endian architectures too, you can use the answers from here to detect the endianness of the system and add an if. Both the union and the pointer casts versions work and are optimized by both gcc and clang resulting in the exact same assembly (no branches). Full code on godebolt:

    int is_big_endian(void)
    {
        union {
            uint32_t i;
            char c[4];
        } bint = {0x01020304};
    
        return bint.c[0] == 1;
    }
    
    void encode_bigend_u64_union(uint64_t value, void* dest)
    {
      if (!is_big_endian())
        //...
      memcpy(dest, &value, sizeof(uint64_t));
    }
    

    Intel® 64 and IA-32 Architectures Instruction Set Reference (3-542 Vol. 2A):

    MOVBE—Move Data After Swapping Bytes

    Performs a byte swap operation on the data copied from the second operand (source operand) and store the result in the first operand (destination operand). [...]

    The MOVBE instruction is provided for swapping the bytes on a read from memory or on a write to memory; thus providing support for converting little-endian values to big-endian format and vice versa.