Search code examples
cassemblyx86gnu-assembler

How to write rotate code in C to compile into the `ror` x86 instruction?


I have some code that rotates my data. I know GAS syntax has a single assembly instruction that can rotate an entire byte. However, when I try to follow any of the advice on Best practices for circular shift (rotate) operations in C++, my C code compiles into at least 5 instructions, which use up three registers-- even when compiling with -O3. Maybe those are best practices in C++, and not in C?

In either case, how can I force C to use the ROR x86 instruction to rotate my data?

The precise line of code which is not getting compiled to the rotate instruction is:

value = (((y & mask) << 1 ) | (y >> (size-1))) //rotate y right 1
       ^ (((z & mask) << n ) | (z >> (size-n))) // rotate z left by n
// size can be 64 or 32, depending on whether we are rotating a long or an int, and 
// mask would be 0xff or 0xffffffff, accordingly

I do not mind using __asm__ __volatile__ to do this rotate, if that's what I must do. But I don't know how to do so correctly.


Solution

  • You might need to be a bit more specific with what integral type / width you're rotating, and whether you have a fixed or variable rotation. ror{b,w,l,q} (8, 16, 32, 64-bit) has forms for (1), imm8, or the %cl register. As an example:

    static inline uint32_t rotate_right (uint32_t u, size_t r)
    {
        __asm__ ("rorl %%cl, %0" : "+r" (u) : "c" (r));
        return u;
    }
    

    I haven't tested this, it's just off the top of my head. And I'm sure multiple constraint syntax could be used to optimize cases where a constant (r) value is used, so %e/rcx is left alone.


    If you're using a recent version of gcc or clang (or even icc). The intrinsics header <x86intrin.h>, may provide __ror{b|w|d|q} intrinsics. I haven't tried them.