My C11 standard is from here. This paragraph says:
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.[61]
and footnote 61 says:
The remaindering operation performed when a value of integer type is converted to unsigned type need not be performed when a value of real floating type is converted to unsigned type. Thus, the range of portable real floating values is (−1, U type _MAX+1)
My confusion is mainly about unsigned int
. My current understanding is the following:
float a = 3.14;
uint32_t b = (uint32_t)a; // defined, b == 3
float a = -1.23;
uint32_t b = (uint32_t)a; // UB!
float a = 2147483646.0; // defined
uint32_t b = (uint32_t)a; // defined, b == 2147483646
uint8_t c = (uint8_t )a; // UB!
Is this correct?
Footnote 61 clarifies the range of floating-point number that can be casted to an unsigned integer type without undefined behavior.
The unsigned integer type can represent value in the range [0; Utype_MAX]. Hence any floating-point value with integer part in this interval can be casted to that unsigned integer type which means values x
where x > -1 and x < Utype_MAX+1
. This is the statement of the last part of footnote 61.
The general rule is that when operations on unsigned integers result in a number outside the range [0; Utype_MAX]
, then the result is reduced module Utype_MAX+1
(also referred to as "wrap-around"). E.g. when adding two 16-bit integers, 40000+40000=80000 which is not representable in 16 bit, the result is reduced module 65536 to 14464.
However, this wrap-around does not need to be done when casting a floating-point number to an unsigned integer. This is the first statement in footnote 61.