This is a continuation of What is the function parameter equivalent of constexpr? In the original question, we are trying to speed-up some code that performs shifts and rotates under Clang and VC++. Clang and VC++ does not optimize the code well because it treats the shift/rotate amount as variable (i.e., not constexpr
).
When I attempt to parameterize the shift amount and the word size, it results in:
$ g++ -std=c++11 -march=native test.cxx -o test.exe
test.cxx:13:10: error: function template partial specialization is not allowed
uint32_t LeftRotate<uint32_t, unsigned int>(uint32_t v)
^ ~~~~~~~~~~~~~~~~~~~~~~~~
test.cxx:21:10: error: function template partial specialization is not allowed
uint64_t LeftRotate<uint64_t, unsigned int>(uint64_t v)
^ ~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.
Here's the test program. Its a tad bit larger than needed so folks can see we need to handle both uint32_t
and uint64_t
(not to mention uint8_t
, uint16_t
and other types).
$ cat test.cxx
#include <iostream>
#include <stdint.h>
template<typename T, unsigned int R>
inline T LeftRotate(unsigned int v)
{
static const unsigned int THIS_SIZE = sizeof(T)*8;
static const unsigned int MASK = THIS_SIZE-1;
return T((v<<R)|(v>>(-R&MASK)));
};
template<uint32_t, unsigned int R>
uint32_t LeftRotate<uint32_t, unsigned int>(uint32_t v)
{
__asm__ ("roll %1, %0" : "+mq" (v) : "I" ((unsigned char)R));
return v;
}
#if __x86_64__
template<uint64_t, unsigned int R>
uint64_t LeftRotate<uint64_t, unsigned int>(uint64_t v)
{
__asm__ ("rolq %1, %0" : "+mq" (v) : "J" ((unsigned char)R));
return v;
}
#endif
int main(int argc, char* argv[])
{
std::cout << "Rotated: " << LeftRotate<uint32_t, 2>((uint32_t)argc) << std::endl;
return 0;
}
I've been through a number of iterations of error messages depending on how I attempt to implement the rotate. Othr error messages include no function template matches function template specialization...
. Using template <>
seems to produce the most incomprehensible one.
How do I parameterize the shift amount in hopes that Clang and VC++ will optimize the function call as expected?
Another way is to turn the templated constant into a constant argument which the compiler can optimise away.
step 1: define the concept of a rotate_distance:
template<unsigned int R> using rotate_distance = std::integral_constant<unsigned int, R>;
step 2: define the rotate functions in terms of overloads of a function which takes an argument of this type:
template<unsigned int R>
uint32_t LeftRotate(uint32_t v, rotate_distance<R>)
Now, if we wish we can simply call LeftRotate(x, rotate_distance<y>())
, which seems to express intent nicely,
or we can now redefine the 2-argument template form in terms of this form:
template<unsigned int Dist, class T>
T LeftRotate(T t)
{
return LeftRotate(t, rotate_distance<Dist>());
}
Full Demo:
#include <iostream>
#include <stdint.h>
#include <utility>
template<unsigned int R> using rotate_distance = std::integral_constant<unsigned int, R>;
template<typename T, unsigned int R>
inline T LeftRotate(unsigned int v, rotate_distance<R>)
{
static const unsigned int THIS_SIZE = sizeof(T)*8;
static const unsigned int MASK = THIS_SIZE-1;
return T((v<<R)|(v>>(-R&MASK)));
}
template<unsigned int R>
uint32_t LeftRotate(uint32_t v, rotate_distance<R>)
{
__asm__ ("roll %1, %0" : "+mq" (v) : "I" ((unsigned char)R));
return v;
}
#if __x86_64__
template<unsigned int R>
uint64_t LeftRotate(uint64_t v, rotate_distance<R>)
{
__asm__ ("rolq %1, %0" : "+mq" (v) : "J" ((unsigned char)R));
return v;
}
#endif
template<unsigned int Dist, class T>
T LeftRotate(T t)
{
return LeftRotate(t, rotate_distance<Dist>());
}
int main(int argc, char* argv[])
{
std::cout << "Rotated: " << LeftRotate((uint32_t)argc, rotate_distance<2>()) << std::endl;
std::cout << "Rotated: " << LeftRotate((uint64_t)argc, rotate_distance<2>()) << std::endl;
std::cout << "Rotated: " << LeftRotate<2>((uint64_t)argc) << std::endl;
return 0;
}
Prior to c++11 we didn't have std::integral_constant, so we have to make our own version.
For our purposes, this is sufficient:
template<unsigned int R> struct rotate_distance {};
full proof - note the effect of optimisations: