Search code examples
c++language-lawyerc++20intrinsicsavx2

Is using C++20's std::popcount with vector optimization is equivalent to popcnt intristic?


C++20 introduces many new functions such as std::popcount, I use the same functionality using an Intel Intrinsic.

I compiled both options - can be seen in Compiler Explorer code:

  1. Using Intel's AVX2 intrinsic
  2. Using std::popcount and GCC compiler flag "-mavx2"

It looks like the generated assembly code is the same, besides of the type checks used in std's template.

In terms of OS agnostic code and having the same optimizations - Is it right to assume that using std::popcount and the apt compiler vector optimization flags is better than directly using intrinsics?

Thanks.


Solution

  • Technically No. (But practically, yes). The C++ standard only specifies the behavior of popcount, and not the implementation (Refer to [bit.count]).

    Implementors are allowed to do whatever they want to achieve this behavior, including using the popcnt intrinsic, but they could also write a while loop:

    int set_bits = 0;
    while(x)
    {
       if (x & 1)
          ++set_bits;
       x >>= 1;
    }
    return set_bits;
    

    This is the entire wording in the standard at [bit.count]:

    template<class T>
    constexpr int popcount(T x) noexcept;
    

    Constraints: T is an unsigned integer type ([basic.fundamental]).
    Returns: The number of 1 bits in the value of x.

    Realistically? Compiler writers are very smart and will optimize this to use intrinsics as much as possible. For example, gcc's implementation appears to be fairly heavily optimized.