Search code examples
c++cvisual-c++x86-64128-bit

Compute (a*b)%n FAST for 64-bit unsigned arguments in C(++) on x86-64 platforms?


I'm looking for a fast method to efficiently compute  (ab) modulo n  (in the mathematical sense of that) for a, b, n of type uint64_t. I could live with preconditions such as n!=0, or even a<n && b<n.

Notice that the C expression (a*b)%n won't cut it, because the product is truncated to 64 bits. I'm looking for (uint64_t)(((uint128_t)a*b)%n) except that I do not have a uint128_t (that I know, in Visual C++).

I'm in for a Visual C++ (preferably) or GCC/clang intrinsic making best use of the underlying hardware available on x86-64 platforms; or if that can't be done for a portable inline function.


Solution

  • 7 years later, I got a solution working in Visual Studio 2019

    #include <stdint.h>
    #include <intrin.h>
    #pragma intrinsic(_umul128)
    #pragma intrinsic(_udiv128)
    
    // compute (a*b)%n with 128-bit intermediary result
    // assumes n>0  and  a*b < n * 2**64 (always the case when a<=n || b<=n )
    inline uint64_t mulmod(uint64_t a, uint64_t b, uint64_t n) {
      uint64_t r, s = _umul128(a, b, &r);
      (void)_udiv128(r, s, n, &r);
      return r;
    }
    
    // compute (a*b)%n with 128-bit intermediary result
    // assumes n>0, works including if a*b >= n * 2**64
    inline uint64_t mulmod1(uint64_t a, uint64_t b, uint64_t n) {
      uint64_t r, s = _umul128(a % n, b, &r);
      (void)_udiv128(r, s, n, &r);
      return r;
    }