Search code examples
c++64-bitmultiplication32-bitintrinsics

MSVC's instrinsics __emulu and _umul128 in GCC/CLang


In MSVC there exist instrinsics __emulu() and _umul128(). First does u32*u32->u64 multiplication and second u64*u64->u128 multiplication.

Do same intrinsics exist for CLang/GCC?

Closest I found are _mulx_u32() and _mulx_u64() mentioned in Intel's Guide. But they produce mulx instruction which needs BMI2 support. While MSVC's intrinsics produce regular mul instruction. Also _mulx_u32() is not available in -m64 mode, while __emulu() and _umul128() both exist in 32 and 64 bit mode of MSVC.

You may try online 32-bit code and 64-bit code.

Of cause for 32-bit one may do return uint64_t(a) * uint64_t(b); (see it online) hoping that compiler will guess correctly and optimize to using u32*u32->u64 multiplication instead of u64*u64->u64. But is there a way to be sure about this? Not to rely on compiler's guess that both arguments are 32-bit (i.e. higher part of uint64_t is zeroed)? To have some intrinsics like __emulu() that make you sure about code.

There is __int128 in GCC/CLang (see code online) but again we have to rely on compiler's guess that we actually multiply 64-bit numbers (i.e. higher part of int128 is zeroed). Is there a way to be sure without compiler guessing, if there exist some intrinsics for that?

BTW, both uint64_t (for 32-bit) and __int128 (for 64-bit) produce correct mul instruction instead of mulx in GCC/CLang. But again we have to rely that compiler guesses correctly that higher part of uint64_t and __int128 is zeroed.

Of cause I can look into assembler code that GCC/Clang have optimized and guessed correctly, but looking at assembler once doesn't guarantee that same will happen always in all circumstances. And I don't know of a way in C++ to statically assert that compiler did correct guess about assembler instructions.


Solution

  • You already have the answer. Use uint64_t and __uint128_t. No intrinsics needed. This is available with modern GCC and Clang for all 64-bit targets. See Is there a 128 bit integer in gcc?

    #include <stdint.h>
    typedef __uint128_t uint128_t;
    
    // 32*32=64 multiplication
    f(uint32_t a, uint32_t b) {
       uint64_t ab = (uint64_t)a * b;
    }
    
    //64*64=128 multiplication
    f(uint64_t a, uint64_t b) {
        uint128_t ab = (uint128_t)a * b;
    }
    

    Note that the cast must be on the operands, or at least on one operand. Casting the result would not work since it would do the multiplication with the shorter type and extend the result.

    But is there a way to be sure about this? Not to rely on compiler's guess

    You get exactly the same guarantee as with compiler intrinsics: that the value of the result is correct. There are never any guarantees about optimization. Just because you used intrinsics doesn't guarantee that the compiler will emit the “obvious” assembly instruction. The only way to have this guarantee is to use inline assembly, and for a simple operation like this it's likely to hurt performance because it would restrict the ways in which the compiler optimizes register usage.