Search code examples
cc99unsignedbitmask

Are bitmasks mandatory for unsigned conversions?


I'm implementing a toy project to learn C and I have a seemingly simple question about unsigned type conversion rules.

In particular, I would like to know if the C standard expects unsigned types converted to smaller unsigned types to simply lose their most significant bits without using any bitmask.

Example: 0xABC (16 bit) -> 0xBC (8 bit)

Example code (Shared link):

#include <stdint.h>
#include <stdio.h>

void print_small_hex_value(uint8_t value) {
    printf("Small hex value from function: %llx\n", value);
}

int main()
{
    uint64_t large_value = 0xABCDEFABCDEFABCD;
    printf("Large hex value: %llx\n", large_value);
    uint8_t small_value = large_value; /* without bit mask */
    printf("Small hex value: %llx\n", small_value);
    uint8_t small_value_masked = large_value & 0xFF; /* with bit mask */
    printf("Small hex value masked: %llx\n", small_value);
    printf("\n");
    print_small_hex_value(large_value); /* print from function */
    print_small_hex_value(large_value & 0xFF);
    print_small_hex_value(small_value);
}

Output:

Large hex value: abcdefabcdefabcd
Small hex value: cd
Small hex value masked: cd

Small hex value from function: cd
Small hex value from function: cd
Small hex value from function: cd

It seems to me that the “magical” conversion works even without bit masks.

So, why many codebases (i.e., CPython) force the bits through bit masking (a.k.a. value & 0xFF)? Is simply later elided by compilers if not necessary? Is it just me not noticing that in these cases, you are really dealing with signed integers?

What's the difference if the larger value (i.e., uint64_t) is passed as an uint8_t parameter or stored in an uint8_t variable? Are the two cases treated differently by compilers?

Can someone point to a trusted source on this matter (like C standard)?


Solution

  • C standard expects unsigned types converted to smaller unsigned types to simply lose their most significant bits without using any bitmask.

    Yes.

    The line:

    %llx\n", small_value
    

    and similar others are invalid. See https://godbolt.org/z/b7xa794x1 . %llx expects unsigned long long argument. small_value has type uint8_t. You should use PRIx8 to from inttypes.h print it.

    Is simply later elided by compilers if not necessary?

    Generally, yes.

    Is it just me not noticing that in these cases you are really dealing with signed integers?

    No.

    What's the difference if the larger value (i.e. uint64_t) is passed as a uint8_t parameter or stored in a uint8_t variable?

    No difference.

    Are the two cases treated differently by compilers?

    Except the obvious, no.

    Can someone points to a trusted source on this matter (like C standard)?

    When a value is assigned to a variable of particular type, that value is converted to the destination type. While you may read https://port70.net/~nsz/c/c11/n1570.html#6.3.1.3p2 :

    Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type

    The 0xABCDEFABCDEFABCD is 12379814471884843981. We repeatedly subtract 256 from this number 48358650280800171 times. After that operation, we are left with 205, which is 0xCD in hex. This is basically a fancy way of describing & 0xff.

    Nowadays, we have more ingestible cppreference https://en.cppreference.com/w/c/language/conversion .

    why many codebases (i.e. CPython) force the bits through bit masking (a.k.a. value & 0xFF)?

    It may be preference of the programmer, for readability or maintainability. There are also security standards in C, like MISRA 2012 rule 10.3 requires you to write uint8_t small_value = (uint8_t)large_value;, but I do not think I know a rule that would require masking.