The GCC documentation (https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers-1) contains the following PowerPC example and description:
static void dgemv_kernel_4x4 (long n, const double *ap, long lda, const double *x, double *y, double alpha) { double *a0; double *a1; double *a2; double *a3; __asm__ ( /* lots of asm here */ "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" "#a0=%3 a1=%4 a2=%5 a3=%6" : "+m" (*(double (*)[n]) y), "+&r" (n), // 1 "+b" (y), // 2 "=b" (a0), // 3 "=&b" (a1), // 4 "=&b" (a2), // 5 "=&b" (a3) // 6 : "m" (*(const double (*)[n]) x), "m" (*(const double (*)[]) ap), "d" (alpha), // 9 "r" (x), // 10 "b" (16), // 11 "3" (ap), // 12 "4" (lda) // 13 : "cr0", "vs32","vs33","vs34","vs35","vs36","vs37", "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47" ); }
... On the other hand,
ap
can’t be the same as any of the other inputs, so an early-clobber ona0
is not needed. It is also not desirable in this case. An early-clobber ona0
would cause GCC to allocate a separate register for the"m" (*(const double (*)[]) ap)
input. Note that tying an input to an output is the way to set up an initialized temporary register modified by anasm
statement. An input not tied to an output is assumed by GCC to be unchanged...
I am totally confused about this description:
"m" (*(const double (*)[]) ap)
and "=b" (a0)
. "=b" (a0)
will share the register with "3" (ap)
, which saves the address of the input parameter, and "m" (*(const double (*)[]) ap)
is the content of the first element of ap
, so why an early-clobber on a0
will impact "m" (*(const double (*)[]) ap)
?"m" (*(const double (*)[]) ap)
, I still don't understand what the problem. Since there is tied between "=b" (a0)
and "3" (ap)
, so we can still read / write through the register that allocated for "=b" (a0)
?This is an efficiency consideration, not correctness, stopping GCC from wasting instructions (and creating register pressure).
"m" (*(const double (*)[]) ap)
isn't the first element, it's an arbitrary-length array, letting the compiler know that the entire array object is an input. But it's a dummy input; the asm template won't actually use that operand, instead looping over the array via the pointer input "3" (ap)
See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more about this technique.
But "m"
inputs are real inputs that have to work expand to an addressing mode if the template does use them, including after early-clobbers have clobbered their register.
With =&b(a0)
/ "3"(ap)
, GCC couldn't pick the same register as the base for an addressing mode for "m" (*(const double (*)[]) ap)
.
So it would have to waste an instruction ahead of the asm statement copying the address to another register. Also wasting that integer register.