c++optimization division signed integer-division

Why A / <constant-int> is faster when A is unsigned vs signed?

I have been reading through the Optimizing C++ wikibook. In the faster operations chapter one of the advice is as follows:

Integer division by a constant

When you divide an integer (that is known to be positive or zero) by a constant, convert the integer to unsigned.

If s is a signed integer, u is an unsigned integer, and C is a constant integer expression (positive or negative), the operation s / C is slower than u / C, and s % C is slower than u % C. This is most significant when C is a power of two, but in all cases, the sign must be taken into account during division.

The conversion from signed to unsigned, however, is free of charge, as it is only a reinterpretation of the same bits. Therefore, if s is a signed integer that you know to be positive or zero, you can speed up its division using the following (equivalent) expressions: (unsigned)s / C and (unsigned)s % C.

I tested this statement with gcc and the u / C expression seems to perform consistently better than the s / c

The following example is also provided below:

#include <iostream>
#include <chrono>
#include <cstdlib>
#include <vector>
#include <numeric>

using namespace std;

int main(int argc, char *argv[])
{

    constexpr int vsize = 1e6;
    std::vector<int> x(vsize);
    std::iota(std::begin(x), std::end(x), 0); //0 is the starting number

    constexpr int a = 5;

  auto start_signed = std::chrono::system_clock::now();
  int sum_signed = 0;
    for ([[gnu::unused]] auto  i : x)
    {
        // signed is by default
        int v = rand() % 30 + 1985;   // v in the range 1985-2014

        sum_signed += v / a;
    }
  auto end_signed = std::chrono::system_clock::now();

  auto start_unsigned = std::chrono::system_clock::now();
  int sum_unsigned = 0;
    for ([[gnu::unused]] auto  i : x)
    {
        int v = rand() % 30 + 1985;   // v in the range 1985-2014
        sum_unsigned += static_cast<unsigned int>(v) / a;
    }
  auto end_unsigned = std::chrono::system_clock::now();

  // signed
  std::chrono::duration<double> diff_signed = end_signed - start_signed;
  std::cout << "sum_signed: " << sum_signed << std::endl;
  std::cout << "Time it took SIGNED: " << diff_signed.count() * 1000 << "ms" << std::endl;

  // unsigned
  std::chrono::duration<double> diff_unsigned = end_unsigned - start_unsigned;
  std::cout << "sum_unsigned: " << sum_unsigned << std::endl;
  std::cout << "Time it took UNSIGNED: " << diff_unsigned.count() * 1000 << "ms" << std::endl;

  return 0;
}

You can compile and run the example here: http://cpp.sh/8kie3

Why is this happening?

Solution

After some toying around, I believe I've tracked down the source of the problem to be the guarantee by the standard that negative integer divisions are rounded towards zero since C++11. For the simplest case, which is division by two, check out the following code and the corresponding assembly (godbolt link).

constexpr int c = 2;

int signed_div(int in){
    return in/c;
}

int unsigned_div(unsigned in){
    return in/c;
}

Assembly:

signed_div(int):
  mov eax, edi
  shr eax, 31
  add eax, edi
  sar eax
  ret

unsigned_div(unsigned int):
  mov eax, edi
  shr eax
  ret

What do these extra instructions accomplish? shr eax, 31 (right shift by 31) just isolates the sign bit, meaning that if input is non-negative, eax == 0, otherwise eax == 1. Then the input is added to eax. In other words, these two instructions translate to "if input is negative, add 1 to it. The implications of the addition are the following (only for negative input).

If input is even, its least significant bit is set to 1, but the shift discards it. The output is not affected by this operation.
If input is odd, its least significant bit was already 1 so the addition causes a remainder to propagate to the rest of the digits. When the right shift occurs, the least significant bit is discarded and the output is greater by one than the output we'd have if we hadn't added the sign bit to the input. Because by default right-shift in two's complement rounds towards negative infinity, the output now is the result of the same division but rounded towards zero.

In short, even negative numbers aren't affected, and odd numbers are now rounded towards zero instead of towards negative infinity.

For non-power-of-2 constants it gets a bit more complicated. Not all constants give the same output, but for a lot of them it looks similar to the following (godbolt link).

constexpr int c = 3;

int signed_div(int in){
    return in/c;
}

int unsigned_div(unsigned in){
    return in/c;
}

Assembly:

signed_div(int):
  mov eax, edi
  mov edx, 1431655766
  sar edi, 31
  imul edx
  mov eax, edx
  sub eax, edi
  ret
unsigned_div(unsigned int):
  mov eax, edi
  mov edx, -1431655765
  mul edx
  mov eax, edx
  shr eax
  ret

We don't care about the change of the constant in the assembly output, because it does not affect execution time. Assuming that mul and imul take the same amount of time (which I don't know for sure but hopefully someone more knowledgeable than me can find a source on it), the signed version once again takes longer because it has extra instructions to handle the sign bit for negative operands.

Notes

Compilation was done on godbot using x86-64 GCC 7.3 with the -O2 flag.
Rounds towards zero behavior is standard-mandated since C++11. Before it was implementation defined, according to this cppreference page.