Search code examples
c++assemblyc++20constexprinline-assembly

Why can't inline assembly be constexpr?


Is there a good reason why C++ doesn't allow you to constexpr inline assembly? And why are unevaluated inline assembly expressions allowed in C++20?


Solution

  • Compilers that support asm() or asm{} as an extension choose not to define it in a way compatible with constexpr. In fact, ISO C++ specifically says they shouldn't, 7.7 [expr.const] 5.27! It's one of the things that prevents an expression from being a constant-expression.

    Real compilers don't optimize your inline asm, including not constant-propagating through it. See When will compilers optimize assembly code in C/C++ source? This would defeat the purpose of inline asm, which is to run exactly the instructions you specify. (Or not, if the result isn't needed, for GNU C asm() without volatile.)

    So they don't know how to constant-evaluate an asm statement; it's the opposite of what makes sense; you're expressing program logic in a totally different language which compilers don't normally have to read as input. This is likely why the ISO C++ standard forbids asm("") in constant expressions.

    It would be a lot of work for compiler writers to implement (for constant propagation even if not constexpr), basically a second language that they'd have to not only scan for used registers (MSVC or clang -fasm-blocks), but they'd have to actually interpret / simulate the asm code, including potential loops. And presumably bail out on memory access, unless they could prove an addressing mode would refer to a known constant C object. Sounds like a total mess, definitely not something compilers would want to do.


    Use intriniscs

    If you want access to machine-specific instructions in a way the compiler can understand and optimize, use intrinsics. See https://gcc.gnu.org/wiki/DontUseInlineAsm for that and other reasons not to use it.

    Some intrinsics and builtins are compatible with constexpr on some compilers, because that makes sense and ISO C++ doesn't prevent implementations from doing so when it makes sense. (Nicol's argument only makes sense for compilers that don't define an asm extension at all, which is obviously not what you're asking about.)


    If you want a compile-time-constant result, you need C++20 std::is_constant_evaluated / C++23 if constexpr, or GNU C __builtin_constant_p(x).

    // Normally don't actually do this, use C++20 std::popcount with appropriate -march
    // or __attribute__((target("popcnt"))) on a function that uses it in a loop.
    // (inlining doesn't work between functions with different target options.)
    
    constexpr int popcount(int x)
    {
       if (__builtin_constant_p(x)) {
           return __builtin_popcount(x);   // Yes, this GNU extension *is* constexpr compatible
                       // because compilers know how to popcount at compile time.
       } else {
          asm("popcnt %0, %0  # from asm statement" : "+r"(x));  // GNU++20 for this to appear in a constexpr function
          // still somewhat optimization defeating since I forced same-register to work around the Intel false dependency
          // and instead of dialect alternatives for AT&T vs. Intel
          return x;
       }
    }
    

    Example on Godbolt with x86-64 asm: we see the constant arg used the builtin (or we could have used a pure ISO pre-C++ way, like a loop or bithack).

    But with a non-constant arg, we see our line of inline-asm use popcnt (including the comment included in the asm("":) statement) even though we didn't tell GCC the binary was only going to run on CPUs with that instruction. (Related: What exactly do the gcc compiler switches (-mavx -mavx2 -mavx512f) do? and The Effect of Architecture When Using SSE / AVX Intrinisics - where __builtin_popcount is like SSE or AVX intrinsics, except it has a fallback to a bithack so it always works, it just doesn't always compile to a single machine instruction.)

    int test_pop1() {
        return popcount(0x555);
    }
    
     #gcc12 -O2 -march=x86-64-v2 -mno-popcnt -std=gnu++20
            mov     eax, 6       # from the pure C side
            ret
    
    # without the if(), we'd get mov eax, 0x555 ; popcnt eax,eax
    
    int test_pop_nonconst(int x) {
        return popcount(x);
    }
    
    # g++ -O2  without a -march that includes popcnt
            mov     eax, edi
            popcnt eax, eax  # from asm statement
            ret
    
    int test_builtin_popcount(int x) {
        return __builtin_popcount(x);
        // popcnt only with -march= new enough.  Otherwise bithack in helper function
    }
    
    # -O2  with no -march, or with -mno-popcnt
            sub     rsp, 8
            mov     edi, edi
            call    __popcountdi2   # libgcc helper function because popcnt might fault
            add     rsp, 8
            ret
    
    # -O2 -march=x86-64-v2 (SSE4.2 Nehalem baseline, generic tuning)
            xor     eax, eax        # break false dependency in case of Intel
            popcnt  eax, edi
            ret
    
    # -O2 -march=znver2
            popcnt  eax, edi        # Zen doesn't have false dependencies for popcnt
            ret