Search code examples
cscanfinteger-overflowtwos-complement

sscanf handles maximal unsigned integer value differently than assignment does


Consider the following code :

main()
{
  int assigned = 4294967295;     // Max unsigned integer value on 32-bits arch

  char input[] = "4294967295";
  int sscanned;


  unsigned int result = sscanf(input, "%d", &sscanned);
  printf ("scanned %u elements : %d\n
          "Assigned j = %d\n", 
          result, sscanned, assigned);

  return 0;
}

When compiled for 32-bits arch (with compilation command: gcc -Wall -Wextra -std=c11 -pedantic -m32 test_sscanf.c -o test_sscanf32), it spits out an expectable warning about "overflow in conversion from ‘long long int’ to ‘int’ changes value from ‘4294967295’ to ‘-1’ [-Woverflow]".

Now seeing the result :

> ./test_sscanf32 
scanned 1 elements : 2147483647
Assigned j = -1

While the assigned value has rightfully been converted into the maximal negative signed integer value, through two's complement representation (-1 = -2^31 + 2^30 + ... + 2^0), the scanned value on the other hand has apparently got its MSB dismissed which caused it to shrink to the value 2147483647 = 2^31 - 1.

So my question is : what does justify such a difference in the treatment of the maximum n-bits integer value on an n-bits machine (knowing that on a 64-bits arch, the same behavior occurs) ?
Is a programmer not rightfully entitled to expect that sscanf would treat the value the same way an assignment does, on a given architecture ?


Solution

  • Converting an integer value to int by cast or assignment, when the value is not representable by int but is representable by some supported type with a larger range, produces an implementation-defined value in the int (C11 §6.3.1.3). Almost all implementations nowadays define this conversion such that int x = UINT_MAX; sets x to −1. The only exception I am aware of is Unisys (née Burroughs) mainframes, which still use ones-complement representation for negative numbers.

    By contrast, all of the scanf functions have undefined behavior upon reading a number which is outside of the representable range for the type of the variable the number will be written to (C11 §7.21.6.2p10). That means, not only can you not count on it to do the same thing that integer conversion does, you can't count on it to do anything constructive at all, and the compiler would in fact be entitled to generate machine code that makes demons fly out of your nose.

    It is my considered opinion that 7.21.6.2p10 is a defect in the standard, but since I consider the scanf family unfit for purpose anyway (this is only one of many problems with them), I can't be bothered to file a DR. Use the strto* functions instead. They have well-defined and documented overflow behavior.