The Haswell architectures comes up with several new instructions. One of them is PEXT
(parallel bits extract) whose functionality is explained by this image (source here):
It takes a value r2
and a mask r3
and puts the extracted bits of r2
into r1
.
My question is the following: what would be the equivalent code of an optimized templated function in pure standard C++11, that would be likely to be optimized to this instruction by compilers in the future.
Here is some code from Matthew Fioravante's stdcxx-bitops GitHub repo that was floated to the std-proposals
mailinglist as a preliminary proposal to add a constexpr
bitwise operations library for C++.
#ifndef HAS_CXX14_CONSTEXPR
#define HAS_CXX14_CONSTEXPR 0
#endif
#if HAS_CXX14_CONSTEXPR
#define constexpr14 constexpr
#else
#define constexpr14
#endif
//Parallel Bits Extract
//x HGFEDCBA
//mask 01100100
//res 00000GFC
//x86_64 BMI2: PEXT
template <typename Integral>
constexpr14 Integral extract_bits(Integral x, Integral mask) {
Integral res = 0;
for(Integral bb = 1; mask != 0; bb += bb) {
if(x & mask & -mask) {
res |= bb;
}
mask &= (mask - 1);
}
return res;
}