I have some code that rotates my data. I know GAS syntax has a single assembly instruction that can rotate an entire byte. However, when I try to follow any of the advice on Best practices for circular shift (rotate) operations in C++, my C code compiles into at least 5 instructions, which use up three registers-- even when compiling with -O3. Maybe those are best practices in C++, and not in C?
In either case, how can I force C to use the ROR x86 instruction to rotate my data?
The precise line of code which is not getting compiled to the rotate instruction is:
value = (((y & mask) << 1 ) | (y >> (size-1))) //rotate y right 1
^ (((z & mask) << n ) | (z >> (size-n))) // rotate z left by n
// size can be 64 or 32, depending on whether we are rotating a long or an int, and
// mask would be 0xff or 0xffffffff, accordingly
I do not mind using __asm__ __volatile__
to do this rotate, if that's what I must do. But I don't know how to do so correctly.
You might need to be a bit more specific with what integral type / width you're rotating, and whether you have a fixed or variable rotation. ror{b,w,l,q}
(8, 16, 32, 64-bit) has forms for (1)
, imm8
, or the %cl
register. As an example:
static inline uint32_t rotate_right (uint32_t u, size_t r)
{
__asm__ ("rorl %%cl, %0" : "+r" (u) : "c" (r));
return u;
}
I haven't tested this, it's just off the top of my head. And I'm sure multiple constraint syntax could be used to optimize cases where a constant (r)
value is used, so %e/rcx
is left alone.
If you're using a recent version of gcc or clang (or even icc). The intrinsics header <x86intrin.h>
, may provide __ror{b|w|d|q}
intrinsics. I haven't tried them.