Finding the size of some data in bytes is a common operation.
Contrived example:
char *buffer_size(int x, int y, int chan_count, int chan_size)
{
size_t buf_size = x * y * chan_count * chan_size; /* <-- this may overflow! */
char *buf = malloc(buf_size);
return buf;
}
The obvious error here is the ints will overflow (an 23171x23171 RGBA byte buffer for eg).
What are the rules for promotion when multiplying 3 or more values?
(Multiplying a pair of values is simple)
We could play it safe and just cast:
size_t buf_size = (size_t)x * (size_t)y * (size_t)chan_count * (size_t)chan_size;
Another alternative is to add in parenthesis to ensure the order of multiplication & promotion is predictable (and automatic promotion between pairs works as expected) ...
size_t buf_size = ((((size_t)x * y) * chan_count) * chan_size;
... which works, but my question is.
Is there a deterministic way to multiply 3 or more values to ensure they will automatically promoted?
(to avoid overflow)
Or is this undefined behavior?
Notes...
size_t
here won't prevent overflow, it just prevents overflowing the maximum value for that type.size_t
too, but thats not the point of this question.In C (and C++), the type of an arithmetic operator is determined as follows:
Both operands are converted to the same type, using the "usual arithmetic conversions".
That's the type of the result.
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions [Note 1] [Note 2]
There is no other rule, so there is no special case for expressions with two or more operators. Each operation is typed independently, according to the syntax.
The result type is not automatically widened in order to avoid or reduce the probability of overflow; the operands are both converted to a common type "which is also the type of the result". So if you multiply two int
s, the result will be an int
and overflow will result in undefined behaviour. [Note 3]
The syntax of the language(s) precisely defines how a full expression is grouped, and evaluation is required to conform to the syntax. The expression a + b + c
must have the same result as the expression (a + b) + c
, because the syntax requires that grouping. The compiler may rearrange the computation as it sees fit, provided it can demonstrate that the result is semantically identical for all valid inputs. But it cannot decide to change the result types of any operators. a + b + c
must have the type which results from applying the usual arithmetic conversions to the types of a
and b
, and then applying them again to that type and the type of c
. [Note 4]
The usual arithmetic conversions are detailed in §6.3.1.8 ("Usual arithmetic conversions") of the C standard, and in paragraph 10 of the introduction to §5 (Expressions) of C++. Roughly speaking, it goes like this:
If both operands are floating point, both operands are converted to the wider of the two types; if one operand is floating point, the other is converted to that floating point type.
Otherwise, if both operands are signed integral types, they are both converted to the widest of the two types and int
.
Otherwise, if both operands are unsigned integral types at least as large as unsigned int
, they are both converted to the wider of the two types.
[Note 5]
Now, take the case of a * b * c * d
, where a
, b
, c
and d
are all int
and the desire is to produce a size_t
.
Syntactically, that expression is equivalent to (((a * b) * c) * d)
, and the usual arithmetic conversions are applied accordingly operation by operation. If you convert a
to size_t
with a cast ((size_t)a * b * c * d
), the conversions will be applied as though it were parenthesized. So the operands and the result of (size_t)a * b
would be size_t
, and therefore so will be the result of (size_t)a * b * c
and thus (size_t)a * b * c * d
. In other words, all the operands will be converted to unsigned size_t
values and all the multiplications will be performed as unsigned size_t
multiplications. That's well-defined but probably meaningless if any of the values happen to be negative.
Either the second or the third multiplication could exceed the capacity of a size_t
, but since size_t
is unsigned, the computation will be performed modulo 2N where N
is the number of value bits in size_t
. The cast, therefore, is not safe in the sense that it avoids overflow, but it does at least avoid undefined behaviour.
The quote is from the C++ standard, §5, paragraph 10. The C standard has a slightly more complicated version in §6.3.1.8, because C11 includes complex arithmetic types. For integer (and non-complex floating-point) operands, C and C++ have identical semantics.
The shift operators are exceptions, which is why it says "many binary operators". The result type of a shift operator is precisely the (possibly promoted) type of its left operand, regardless of the type of the right operand. All bitwise operators are restricted to integers, so the part of the "usual arithmetic conversions" which involve real numbers don't apply to those operators.
If you multiply two unsigned int
s, the result will be an unsigned int
and the computation is defined for all values:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type. (C §6.2.5/9)
Both C and C++ standards are very clear on this point, and include examples to drive it home. In general, neither signed integer nor floating-point operators are associative, so it is probably only possible to regroup and rearrange a computation if that computation involves only unsigned integer arithmetic.
An example of a case where regrouping of integer arithmetic would be prohibited appears as Example 6 in §5.1.2.3 of the C standard and as paragraph 9 of §1.9 of the C++ standard. (It's the same example.) Suppose we have a machine with 16-bit int
s, where signed overflow results in a trap. In that case, a = a + 32760 + b + 5;
cannot be rewritten as a = (a + b) + 32765;
:
if the values for a and b were, respectively, −32754 and −15, the sum a + b would produce a trap while the original expression would not;
Those are the simple, untroublesome cases. Normally you should try to avoid the other ones, but for the record:
a. Before the above happens, if the type of either operand is narrower than int
, then that operand will be promoted to either int
or unsigned int
. Normally, it will be promoted to int
, even if it was unsigned. Only if int
is not wide enough to represent all values of the type will the operand be promoted to unsigned int
. For example, on most architectures an unsigned char
operand will be promoted to an int
, not an unsigned int
(Although architectures in which char
and int
are the same width are possible, they are not common.)
b. Finally, if one type is signed and the other is unsigned, then they will both be converted to:
the unsigned type if it is at least as wide as the signed type. (Eg. unsigned int
* int
=> unsigned int
)
the signed type if it is wide enough to hold all the values of the unsigned type. (Eg. unsigned int
* long long
=> long long
if long long
is wider than int
)
the unsigned type corresponding to the signed type if none of the above cases hold.