Search code examples
c++coptimizationcompiler-optimizationintrinsics

Efficient overflow-immune arithmetic mean in C/C++


The arithmetic mean of two unsigned integers is defined as:

mean = (a+b)/2

Directly implementing this in C/C++ may overflow and produce a wrong result. A correct implementation would avoid this. One way of coding it could be:

mean = a/2 + b/2 + (a%2 + b%2)/2

But this produces rather a lot of code with typical compilers. In assembler, this usually can be done much more efficiently. For example, the x86 can do this in the following way (assembler pseudo code, I hope you get the point):

ADD a,b   ; addition, leaving the overflow condition in the carry bit
RCR a,1   ; rotate right through carry, effectively a division by 2

After those two instructions, the result is in a, and the remainder of the division is in the carry bit. If correct rounding is desired, a third ADC instruction would have to add the carry into the result.

Note that the RCR instruction is used, which rotates a register through the carry. In our case, it is a rotate by one position, so that the previous carry becomes the most significant bit in the register, and the new carry holds the previous LSB from the register. It seems that MSVC doesn't even offer an intrinsic for this instruction.

Is there a known C/C++ pattern that can be expected to be recognized by an optimizing compiler so that it produces such efficient code? Or, more generally, is there a rational way how to program in C/C++ source level so that the carry bit is being used by the compiler to optimize the generated code?

EDIT:

A 1-hour lecture about std::midpoint: https://www.youtube.com/watch?v=sBtAGxBh-XI

Wow!

EDIT2: Great discussion on Microsoft blog


Solution

  • The following method avoids overflow and should result in fairly efficient assembly (example) without depending on non-standard features:

        mean = (a&b) + (a^b)/2;