Search code examples
ccastingintegerchar

How does the last byte of an integer value impact char casts in C conversions?


I am learning C and I was studying strcmp from another question and I ended up on this old mysql bug :- https://bugs.mysql.com/bug.php?id=64884

From my understanding the bug was because memcmp was used to compare two unint8 hash values to check for equality and a bug arose when the two hashes were not equal so memcmp returned a non-zero integer (could be positive or negative depending on size difference). The non-zero integer would then be automatically casted to char as that is the return type of the function and a 0 value would mean the hashes were equal and I assume any other value would mean otherwise. In some instances even though the hashes were not equal it would return 0 which would allow people to access a database with the wrong password.

Now this last part is where I am getting confused, the bug page says If memcmp happen to return a non-zero number that has a zero last byte - check_scramble will return 0 but I am not sure how the last byte of any integer could affect the value of the casted char.

I tested locally by writing a simple function

char int2char(int num) {
  return num;
}

and then printing the return of the function twice on the same non-zero integer but one with a non-zero last byte and one with a zero last byte

printf("%d", int2char(0x000000FF)); // returns -1
printf("%d", int2char(0x00FF0000)); // returns 0

So this really does happen but why?

Bonus question if possible: The fix for this from the bug page was to wrap memcmp(..) with test(memcmp(..)) but I couldn't find out what the test function does and how it can solve the original problem.

EDIT: Appreciate the replies from everyone. Also it seems that test is probably a custom function used in the code base for mysql.


Solution

  • What you're seeing is a result of what happens when an int is converted to a char. Section 6.3.1.3 of the C standard regarding conversion between integer types states:

    • 1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
    • 2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
    • 3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

    Assuming char is signed, clause 3 is what applies here, which states the result is implementation defined. If we then check the GCC documentation regarding integer conversions:

    The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).

    For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.

    We see that the result of the conversion is reduced modulo 2^N. Assuming that signed integers are implemented using two's complement, this essentially means taking the low-order N bits of the result.

    If char is unsigned, then clause 2 above applies, which as in the signed case means taking the low-order N bits of the result.

    In the case of converting an int to a char (whether char is signed or unsigned), this basically means that the low-order byte is the result of the conversion. So if the return value of memcmp contains a 0 in the low order byte, converting this value to char will result in the value 0.