Search code examples
cmodulusspeed-test

Why is modulus with unsigned int runs faster than with unsigned long long?


Why to test speed of modulus?


I have an app where modulus operation is performed millions times a second. I have to work with very big numbers, so I chose unsigned long long as a data type. About a week ago I've written a new algorithm for my app that required performing modulus operation on numbers which are much less than the numbers I used to work with (e.g. 26 instead of 10000000). I chose to use unsigned int as a data type. The speed increased dramatically while the algorithm is almost the same.

Testing...


I've written two simple programs in C to test the speed of modulus calculation.

#include <stdio.h>

typedef unsigned long long ull;

int main(){
   puts("Testing modulus with ull...");
   ull cnt;
   ull k, accum=0;
   for(k=1, cnt=98765432;k<=10000000;++k,--cnt) 
      accum+=cnt%80;
   printf("%llu\n",accum);
   return 0;
}

The only thing I was changing was the type of the variable called cnt.

I tested these programs with time ./progname and the results were as follows.

  • With unsigned long long: 3.28 sec
  • With unsigned int: 0.33 sec

Note: I'm testing it on a jailbroken iPad, that's why it takes so much time.

Why?


Why does the version with unsigned long long take so much time to run?

Update1: added --cnt to the loop so cnt%80 won't be constant; still the same results.

Update2: removed printf and added accum to get rid of time taken by printf; results are much less now but still pretty different.


Solution

  • Fundamentally, the amount of time it takes to perform an arithmetic operation scales at least linearly with the number of bits in the operands. For modern cpus, the time is constant (usually one cycle) for addition, subtraction, logical operations, and maybe multiplication when the operands fit in registers, but scale up to RSA order of magnitude or other "bignum" usage and you will clearly see how the time to perform arithmetic scales.

    In the case of division and remainder operations, these are inherently more costly, and often you will notice a significant difference with different operand sizes. And of course if your cpu is 32-bit, doing a 64-bit division/remainder operation is going to involve constructing it out of multiple smaller operations (much like a small special-case of "bignum" arithmetic) so it's going to be considerably slower.

    Your test, however, is just completely invalid. The division is constant so it should not even be recomputed on each loop iteration, the time spent in the loop should be dominated by printf, and the format specifier you're using with printf is not valid for printing arguments of type unsigned long long, so your program has undefined behavior.