C++20 introduces many new functions such as std::popcount
, I use the same functionality using an Intel Intrinsic.
I compiled both options - can be seen in Compiler Explorer code:
It looks like the generated assembly code is the same, besides of the type checks used in std's template.
In terms of OS agnostic code and having the same optimizations -
Is it right to assume that using std::popcount
and the apt compiler vector optimization flags is better than directly using intrinsics?
Thanks.
Technically No. (But practically, yes). The C++ standard only specifies the behavior of popcount
, and not the implementation (Refer to [bit.count]).
Implementors are allowed to do whatever they want to achieve this behavior, including using the popcnt
intrinsic, but they could also write a while loop:
int set_bits = 0;
while(x)
{
if (x & 1)
++set_bits;
x >>= 1;
}
return set_bits;
This is the entire wording in the standard at [bit.count]:
template<class T>
constexpr int popcount(T x) noexcept;
Constraints:
T
is an unsigned integer type ([basic.fundamental]).
Returns: The number of1
bits in the value ofx
.
Realistically? Compiler writers are very smart and will optimize this to use intrinsics as much as possible. For example, gcc's implementation appears to be fairly heavily optimized.