Search code examples
modulointeger-divisioncpu-speedprogramming-pearls

Why is modulus operator slow?


Paraphrasing from in "Programming Pearls" book (about c language on older machines, since book is from the late 90's):

Integer arithmetic operations (+, -, *) can take around 10 nano seconds whereas the % operator takes up to 100 nano seconds.

  • Why there is that much difference?
  • How does a modulus operator work internally?
  • Is it same as division (/) in terms of time?

Solution

  • The modulus/modulo operation is usually understood as the integer equivalent of the remainder operation - a side effect or counterpart to division.

    Except for some degenerate cases (where the divisor is a power of the operating base - i.e. a power of 2 for most number formats) this is just as expensive as integer division!

    So the question is really, why is integer division so expensive?

    I don't have the time or expertise to analyze this mathematically, so I'm going to appeal to grade school maths:

    Consider the number of lines of working out in the notebook (not including the inputs) required for:

    • Equality: (Boolean operations) essentially none - in computer "big O" terms this is known a O(1)
    • addition: two, working left to right, one line for the output and one line for the carry. This is an O(N) operation
    • long multiplication: n*(n+1) + 2: two lines for each of the digit products (one for total, one for carry) plus a final total and carry. So O(N^2) but with a fixed N (32 or 64), and it can be pipelined in silicon to less than that
    • long division: unknown, depends upon the argument size - it's a recursive descent and some instances descend faster than others (1,000,000 / 500,000 requires less lines than 1,000 / 7). Also each step is essentially a series of multiplications to isolate the closest factors. (Although multiple algorithms exist). Feels like an O(N^3) with variable N

    So in simple terms, this should give you a feel for why division and hence modulo is slower: computers still have to do long division in the same stepwise fashion tha you did in grade school.

    If this makes no sense to you; you may have been brought up on school math a little more modern than mine (30+ years ago).


    The Order/Big O notation used above as O(something) expresses the complexity of a computation in terms of the size of its inputs, and expresses a fact about its execution time. http://en.m.wikipedia.org/wiki/Big_O_notation

    O(1) executes in constant (but possibly large) time. O(N) takes as much time as the size of its data-so if the data is 32 bits it takes 32 times the O(1) time of the step to calculate one of its N steps, and O(N^2) takes N times N (N squared) the time of its N steps (or possibly N times MN for some constant M). Etc.


    In the above working I have used O(N) rather than O(N^2) for addition since the 32 or 64 bits of the first input are calculated in parallel by the CPU. In a hypothetical 1 bit machine a 32 bit addition operation would be O(32^2) and change. The same order reduction applies to the other operations too.