Search code examples

Standard C++11 code equivalent to the PEXT Haswell instruction (and likely to be optimized by compiler)

The Haswell architectures comes up with several new instructions. One of them is PEXT (parallel bits extract) whose functionality is explained by this image (source here):


It takes a value r2 and a mask r3 and puts the extracted bits of r2 into r1.

My question is the following: what would be the equivalent code of an optimized templated function in pure standard C++11, that would be likely to be optimized to this instruction by compilers in the future.


  • Here is some code from Matthew Fioravante's stdcxx-bitops GitHub repo that was floated to the std-proposals mailinglist as a preliminary proposal to add a constexpr bitwise operations library for C++.

    #ifndef HAS_CXX14_CONSTEXPR
    #define HAS_CXX14_CONSTEXPR 0
    #define constexpr14 constexpr
    #define constexpr14
    //Parallel Bits Extract
    //x    HGFEDCBA
    //mask 01100100
    //res  00000GFC
    //x86_64 BMI2: PEXT
    template <typename Integral>
    constexpr14 Integral extract_bits(Integral x, Integral mask) {
      Integral res = 0;
      for(Integral bb = 1; mask != 0; bb += bb) {
        if(x & mask & -mask) {
          res |= bb;
        mask &= (mask - 1);
      return res;