Search code examples
gccinline-assembly

What is the GCC documentation and example saying about inline asm and not using early clobbers so a pointer shares a register with a mem input?


The GCC documentation (https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers-1) contains the following PowerPC example and description:

static void
dgemv_kernel_4x4 (long n, const double *ap, long lda,
                  const double *x, double *y, double alpha)
{
  double *a0;
  double *a1;
  double *a2;
  double *a3;

  __asm__
    (
     /* lots of asm here */
     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
     "#a0=%3 a1=%4 a2=%5 a3=%6"
     :
       "+m" (*(double (*)[n]) y),
       "+&r" (n),  // 1
       "+b" (y),   // 2
       "=b" (a0),  // 3
       "=&b" (a1), // 4
       "=&b" (a2), // 5
       "=&b" (a3)  // 6
     :
       "m" (*(const double (*)[n]) x),
       "m" (*(const double (*)[]) ap),
       "d" (alpha),    // 9
       "r" (x),        // 10
       "b" (16),   // 11
       "3" (ap),   // 12
       "4" (lda)   // 13
     :
       "cr0",
       "vs32","vs33","vs34","vs35","vs36","vs37",
       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
     );
}

... On the other hand, ap can’t be the same as any of the other inputs, so an early-clobber on a0 is not needed. It is also not desirable in this case. An early-clobber on a0 would cause GCC to allocate a separate register for the "m" (*(const double (*)[]) ap) input. Note that tying an input to an output is the way to set up an initialized temporary register modified by an asm statement. An input not tied to an output is assumed by GCC to be unchanged...

I am totally confused about this description:

  1. For the code there is no relationship between "m" (*(const double (*)[]) ap) and "=b" (a0). "=b" (a0) will share the register with "3" (ap), which saves the address of the input parameter, and "m" (*(const double (*)[]) ap) is the content of the first element of ap, so why an early-clobber on a0 will impact "m" (*(const double (*)[]) ap)?
  2. Even if gcc allocate a new register to "m" (*(const double (*)[]) ap), I still don't understand what the problem. Since there is tied between "=b" (a0) and "3" (ap), so we can still read / write through the register that allocated for "=b" (a0)?

Solution

  • This is an efficiency consideration, not correctness, stopping GCC from wasting instructions (and creating register pressure).


    "m" (*(const double (*)[]) ap) isn't the first element, it's an arbitrary-length array, letting the compiler know that the entire array object is an input. But it's a dummy input; the asm template won't actually use that operand, instead looping over the array via the pointer input "3" (ap)

    See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more about this technique.

    But "m" inputs are real inputs that have to work expand to an addressing mode if the template does use them, including after early-clobbers have clobbered their register.

    With =&b(a0) / "3"(ap), GCC couldn't pick the same register as the base for an addressing mode for "m" (*(const double (*)[]) ap).
    So it would have to waste an instruction ahead of the asm statement copying the address to another register. Also wasting that integer register.