Search code examples
c++signed

C++ unsigned characters not working as intended


I've always dealt with unsigned numbers but for some weird reason they're not working in this code.

Here is the function I have

string compare (unsigned char H[4])
{
unsigned int sum = H[3] + H[2] * 0x100 + H[1] * 0x100 * 0x100 + H[0] * 0x100 * 0x100 * 0x100;
switch(sum)
    {
    case 0x414b4220: return "akb"; break;
    case 0x89425446: return "png"; break;
    case 0xefbbbf3c: return "plist"; break;
    case 0x4c574600:
    case 0x5df90000: return "lwf"; break;
    case 0x4a4d5000:
    case 0x424d4900:
    case 0x434c5300:
    case 0x43485000:
    case 0x53544700:
    case 0x4d415000: return "bin"; break;
    default: return "";
    }

}

The function is being called this way:

unsigned char C[4];
fin2>>C[0]>>C[1]>>C[2]>>C[3];
string ext = compare(C);

The output I'm getting for the value of sum is always 0x89425446, however the value return is always "". So where am I going wrong?

Note: Sometimes the output I get is 0x80000000


Solution

  • I have figured out the problem, I hope:

    H[0] * 0x100 * 0x100 * 0x100
    

    (Let us assume 32-bit integers)

    According to this answer, 0x100 is actually a signed constant. Furthermore, according to implicit integer promotion rules:

    unsigned char or unsigned short can be converted to int if it can hold its entire value range, and unsigned int otherwise;

    Thus, the H[0] will be converted to a signed int, so the whole expression is signed, so an H[0] of 0x89 will result in an attempt to get a value of (signed int) (0x89000000), or in other words you will have an integer overflow, undefined behaviour, which in your case gives you an unexpectedly negative value.

    As I mentioned in my comments, the canonical way of converting four unsigned bytes to an unsigned integer is:

    unsigned int sum = (unsigned int) H[3]
                    | ((unsigned int) H[2] << 8)
                    | ((unsigned int) H[1] << 16)
                    | ((unsigned int) H[0] << 24);
    

    This explicitly promotes H[] to unsigned int, and using a << shift avoids problems of unexpected sign conversions.

    PS: Please do not be rude to fellow users - I almost deleted this answer when I saw your comment.