Search code examples
cstandardsunsignedsigned

How do I byte-swap a signed number in C?


I understand that casting from an unsigned type to a signed type of equal rank produces an implementation-defined value:

C99 6.3.1.3:

  1. Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

This means I don't know how to byte-swap a signed number. For instance, suppose I am receiving two-byte, twos-complement signed values in little-endian order from a peripheral device, and processing them on a big-endian CPU. The byte-swapping primitives in the C library (like ntohs) are defined to work on unsigned values. If I convert my data to unsigned so I can byte-swap it, how do I reliably recover a signed value afterward?


Solution

  • I understand that casting from an unsigned type to a signed type of equal rank produces an implementation-defined value.

    It will be implementation-defined only because the signedness format in C is implementation-defined. For example, two's complement is one such implementation-defined format.

    So the only issue here is if either side of the transmission would not be two's complement, which is not likely going to happen in the real world. I would not bother to design programs to be portable to obscure, extinct one's complement computers from the dark ages.

    This means I don't know how to byte-swap a signed number. For instance, suppose I am receiving two-byte, twos-complement signed values in little-endian order from a peripheral device, and processing them on a big-endian CPU

    I suspect a source of confusion here is that you think a generic two's complement number will be transmitted from a sender that is either big or little endian and received by one which is either big/little. Data transmission protocols don't work like that though: they explicitly specify endianess and signedness format. So both sides have to adapt to the protocol.

    And once that's specified, there's really no rocket science here: you are receiving 2 raw bytes. Store them in an array of raw data. Then assign them to your two's complement variable. Suppose the protocol specified little endian:

    int16_t val;
    uint8_t little[2];
    
    val = (little[1]<<8) | little[0];
    

    Bit shifting has the advantage of being endian-independent. So the above code will work no matter if your CPU is big or little. So although this code contains plenty of ugly implicit promotions, it is 100% portable. C is guaranteed to treat the above as this:

    val = (int16_t)( ((int)((int)little[1]<<8)) | (int)little[0] );
    

    The result type of the shift operator is that of its promoted left operand. The result type of | is the balanced type (usual arthmetic conversions).

    Shifting signed negative numbers would give undefined behavior, but we get away with the shift because the individual bytes are unsigned. When they get implicitly promoted, the numbers are still treated as positive.

    And since int is guaranteed to be at least 16 bits, the code will work on all CPUs.

    Alternatively, you could use pedantic style that completely excludes all implicit promotions/conversions:

    val = (int16_t) ( ((uint32_t)little[1] << 8) | (uint32_t)little[0] );
    

    But this comes at the cost of readability.