Search code examples
c++recursionc++14bittemplate-meta-programming

Bitswap function using template metaprogramming


The bit twiddling hacks website propose the following very efficient function to reverse bits:

// Bitswap: reverse the bits of the value of unsigned integral type T
template <class T>
constexpr T bitswap(T src)
{
    constexpr std::size_t char_bit = std::numeric_limits<unsigned char>::digits;
    constexpr std::size_t digits = sizeof(T) * char_bit;
    std::size_t size = digits;
    T mask = ~T();        
    while ((size >>= 1) > 0) {
        mask ^= (mask << size);
        src = ((src >> size) & mask) | ((src << size) & ~mask);
    }
    return src;
}
  • Is there any way to speed up this function by using template metaprogramming recursion to unroll the loop?
  • Is there a way to make it work with extended types such as __uint128_t? (the original version works with __uint128_t)
  • Does this function work theoretically to reverse the bits of types with a non-power of two number of bits if digits is correctly initialized to the correct number of bits? (for example a hypotherical uint41_t).

Solution

  • This is a little utility function that lets you unpack parameter packs inline into a lambda in C++14:

    template<class=void, std::size_t...Is>
    constexpr auto indexer(std::index_sequence<Is...>) {
      return [](auto&& f) {
        using discard=int[];
        (void)discard{0,(void(
          f(std::integral_constant<std::size_t, Is>{})
        ),0)...};
      };
    }
    template<std::size_t N>
    constexpr auto indexer() {
      return indexer( std::make_index_sequence<N>{} );
    }
    

    Next we need a compile time log function:

    constexpr std::size_t ct_log_2( std::size_t N ) {
      return (N>1)?1+ct_log_2(N>>1):0;
    }
    

    we then put these together:

    template <class T>
    constexpr T bitswap(T src)
    {
      constexpr std::size_t char_bit = std::numeric_limits<unsigned char>::digits;
      static_assert(char_bit == 8);
      constexpr std::size_t digits = sizeof(T) * char_bit;
      T mask = ~T();        
    
      auto expand = indexer<ct_log_2(digits)>();
      expand([&](auto i){
        constexpr auto size = digits >> (i+1);
        mask ^= (mask << size);
        src = ((src >> size) & mask) | ((src << size) & ~mask);
      });
    
      return src;
    }
    

    Sadly this requires a C++17 feature of constexpr lambdas. However the indexer work can be turned into a verbose manual implementation.

    Create a constexpr size calculator:

    template<std::size_t digits, std::size_t I>
    constexpr auto size_calc = (digits >> (I+1));
    

    Replace the expand section with:

      using discard=int[];
      (void)discard{0,(void((
        void( mask ^= (mask << size_calc<digits, Is>) ),
        void( src = ( (src >> size_calc<digits, Is> ) & mask ) | ((src << size_calc<digits, Is>) & ~mask) ),
        0
      )),0)...};
    

    where we have manually expanded what expand does for us (isn't it ugly?), then have a one-argument version call:

      return bitswap(src, std::make_index_sequence< ct_log_2(digits) >{} );
    

    with the right index sequence.

    The result should be equivalent.

    Live example.

    Some compilers resist inlining deep recursive calls. The recursion depth here is 1 to 3 (to get the parameter pack growth). Now a naive recursive solution is only log_2(128) or 6, so this might be overkill.