Search code examples
cassemblygccinline-assembly

Why there is combination of earlyclobber and matching constraint in C inline assembly?


The code I met is in linux kernel in arch/x86/lib/usercopy_32.c:

#define __copy_user(to, from, size)                 \
do {                                    \
    int __d0, __d1, __d2;                       \
    __asm__ __volatile__(                       \
        "   cmp  $7,%0\n"                   \
        "   jbe  1f\n"                  \
        "   movl %1,%0\n"                   \
        "   negl %0\n"                  \
        "   andl $7,%0\n"                   \
        "   subl %0,%3\n"                   \
        "4: rep; movsb\n"                   \
        "   movl %3,%0\n"                   \
        "   shrl $2,%0\n"                   \
        "   andl $3,%3\n"                   \
        "   .align 2,0x90\n"                \
        "0: rep; movsl\n"                   \
        "   movl %3,%0\n"                   \
        "1: rep; movsb\n"                   \
        "2:\n"                          \
        ".section .fixup,\"ax\"\n"              \
        "5: addl %3,%0\n"                   \
        "   jmp 2b\n"                   \
        "3: lea 0(%3,%0,4),%0\n"                \
        "   jmp 2b\n"                   \
        ".previous\n"                       \
        _ASM_EXTABLE(4b,5b)                 \
        _ASM_EXTABLE(0b,3b)                 \
        _ASM_EXTABLE(1b,2b)                 \
        : "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2)   \
        : "3"(size), "0"(size), "1"(to), "2"(from)      \
        : "memory");                        \
} while (0)

This is a bit complex but basically it just copy size bytes from address from to address to. The detail of the above code may not be really important (I show the code just to say that there is a real usage which combines earlyclobber and matching constraint) .

My question is:

Why we need earlyclobber while we have matching constraint? Since earlyclobber is aimed at preventing to use the same register for input and output, but matching constraint means they should use the same register? Is not this contradictory ?


Solution

  • You're right, early-clobber isn't necessary here because there aren't any operands where the compiler has a choice of what register to pick (or even any which aren't matching constraints and thus equivalent to "+r" read+write outputs).

    It's not contradictory. Early-clobber means the output is written before the last read of any input. If you had a statement like this:

    asm("..." : "=&r"(dst) : "0"(x), "r"(y) : "memory");
    

    The early clobber would be necessary if the template writes dst before reading y. It would stop the "r" constraint from picking the same register as %0, which it might in a case where the compiler knew x == y.

    Using a matching constraint to force x to pick the same register as the dst output is fine; the compiler will do what you tell it. This would make sense if the template is something like "inc %0; imul %2, %0" which reads+writes %0, and then reads y (and the new value of %0).

    It doesn't need to read x after dst is written, only y, so it's fine to force x into the same register as dst.