linux-kernel embedded-linux atomic inline-assembly arm64

Questions on atomic_add function of arch/arm64/include/asm/atomic.h

I am very new to Linux Kernel-based C-coding Style. I am trying to understand the following implementation of the "atomic_add" function from "arch/arm64/include/asm/atomic.h" file (Lines 112-124 of here).

static inline void atomic_add(int i, atomic_t *v)
{ 
    unsigned long tmp;
    int result;
    asm volatile("// atomic_add\n"
        "1: ldxr    %w0, %2\n"
        "   add %w0, %w0, %w3\n"
        "   stxr    %w1, %w0, %2\n"
        "   cbnz    %w1, 1b"
        : "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
        : "Ir" (I));

}

Please help me to understand the following questions.

What is the meaning of %w0 or %w3? I understand that %2 is referring to the counter value.
Is %w0 referring to the (result) variable or a general-purpose register?
Does the constraint string "Ir" stand for "Immediate Register"?

Solution

The w is an operand modifier. It causes the inline asm to contain the 32-bit name of the register (w0, etc) instead of its 64-bit name (x0) which would be the default. See the documentation. (This feature was undocumented in gcc 13 and earlier, but has been supported in practice for a long time, for compatibility with armclang.) You can also try it and note that if you write %0 instead of %w0, the generated instruction uses the 64-bit x register. That is not what you would want since these should be 32-bit loads and stores.
Both. As usual for GCC-style extended asm, %w0 refers to operand number 0 of the inline asm (with, as mentioned, the w modifier to use its 32-bit name). Here that is the one declared with "=&r" (result). Since the constraint is r, this operand will be allocated a general-purpose register, and all mentions of %0 (respectively %w0) in the asm code will be replaced with the name of that register. In the Godbolt example above, the compiler chose x9 (respectively w9).

The (result) means that after the asm statement, the compiler should take whatever is left in w9 and store it in the variable result. It could do this with a store to memory, or a mov to whatever register is being used for result, or it could just allocate result in that variable itself. With luck, the optimizer should choose the latter; and since result isn't used for anything after the asm, it should not do anything further with that register. So in effect, an output operand with a variable that isn't used afterwards is a way of telling the compiler "please pick a register that I can use as scratch".
This is two constraints, I and r. Constraints are documented by GCC: simple and machine-specific, and when multiple constraints are given, the compiler can choose to satisfy any one of them.

I asks for an immediate value suitable for use in an AArch64 add instruction, i.e. a 12-bit zero-extended number optionally shifted by 12 bits which is a compile-time constant. r, as you know, asks for a general-purpose register. So if you write any of atomic_add(1, &c) or atomic_add(1+1+1, &c) or atomic_add(4095, &c) or atomic_add(4096, &c), the second line of the asm statement will be emitted as immediate add instruction, with your constant encoded directly into the instruction: add w9, w9, #1 and so on. But if you write atomic_add(4097, &c) or atomic_add(my_variable, &c), the compiler will generate additional code before the asm to load the appropriate value into some register (say w13) and emit add w9, w9, w13 inside your asm. This lets the compiler generate the more efficient immediate add whenever possible, while still getting correct code in general.