Search code examples
cgccvisual-c++bit-manipulationstrtol

Casting hex string to signed int results in different values in different platforms


I am dealing with an edge case in a program that I want to be multi-platform. Here is the extract of the problem:

#include <stdio.h>
#include <string.h>

void print_bits(size_t const size, void const * const ptr){
    unsigned char *b = (unsigned char*) ptr;
    unsigned char byte;
    int i, j;

    for (i=size-1;i>=0;i--)
    {
        for (j=7;j>=0;j--)
        {
            byte = (b[i] >> j) & 1;
            printf("%u", byte);
        }
    }
    puts("");
}

int main() {

char* ascii = "0x80000000";
int myint = strtol(ascii, NULL, 16);

printf("%s to signed int is %d and bits are:\t", ascii, myint);
print_bits(sizeof myint, &myint);

return 0;
}

So when I compile with GCC on Linux I get this output:

0x80000000 to signed int is -2147483648 and bits are:   10000000000000000000000000000000

On a Windows, using MSVC and MinGW I get:

0x80000000 to signed int is 2147483647 and bits are:    01111111111111111111111111111111

I think the GCC outputs the correct expected values. My question is, where does this difference come from and how to make sure that on all compilers I get the correct result?

UPDATE

The reason behind this code is, I have to check if the MSB (bit #31) of the HEX value is 0 or 1. Then, I have to get the unsigned integer value of the next 7 bits (#30 to #24) result (in case of 0x80000000these 7 bits should result in 0:

    int msb_is_set = myint & 1;
    uint8_t next_7_bits;

next_7_bits = myint >> 24; //fine on GCC, outputs 0 for the next 7 bits
#ifdef WIN32 //If I do not do this, next_7_bit will be 127 on Windows instead of 0
    if(msb_is_set )
        next_7_bits = myint >> 1;
#endif

P.S. This is on the same machine (i5 64bit)


Solution

  • You're dealing with different data models here.

    Windows 64 uses LLP64, which means only long long and pointers are 64bit. As strtol converts to long, it converts to a 32bit value, and 0x80000000 in a 32bit signed integer is negative.

    Linux 64 uses LP64, so long, long long and pointers are 64bit. I guess you see what's happening here now ;)


    Thanks to the comments, I realize my initial answer was wrong. The different outcome indeed has to do with the differing models on those platforms. But: in case of the LP64 model, you have a conversion to a signed type that cannot hold the value, which is implementation defined. int is 32bit on both platforms and a 32bit int just cannot hold 0x80000000. So the correct answer is: you shouldn't expect any result from your code on Linux64. On Win64, as long is only 32bit, strtol() correctly returns LONG_MAX for 0x80000000, which happens to be just one smaller than your input.