Search code examples
carminline-assemblycortex-mbare-metal

Inserting inline assembly code into C function - I/O questions


I am developing an embedded C application for my Cortex M3 microcontroller using the GNU arm-none-eabi toolchain.

I have plan to adopt an assembly subroutine that the vendor implemented into my C application. I plan to make a new C function, then within that, write an inline assembly block using the extended inline assembly protocol. In this post, I plan to treat this assembly subroutine as a black box, and plan to ask this forum about how to structure the inputs and clobber list; this routine has no outputs.

The assembly subroutine expects r0, r1, and r2 to be pre-set prior to the call. Further, the subroutine uses registers r4, r5, r6, r7, r8, r9 as scratch registers to do its function. It writes to a range of memory on the device, specified by r0 and r1 which are the start and stop addresses, respectively.

So, I am checking if my assumptions are correct. My questions follow.

  1. My function that I think I should write, is this right?:
void my_asm_ported_func(int reg_r0, int reg_r1, int reg_r2 {
    __asm__ __volatile__ (
        "ldr r0, %0        \n\t",
        "ldr r1, %1        \n\t",
        "ldr r2, %2        \n\t",
        "<vendor code...>  ",
        :  /* no outputs */
        : "r" (reg_r0), "r" (reg_r1), "r" (reg_r2) /* inputs */
        : "r0", "r1", "r2", "r4", "r5", "r6", 
          "r7", "r8", "r9", "memory" /* clobbers */
    );
}
  1. Since this asm subroutine writes to a range of other memory on the device, is adding "memory" to the clobber list enough? Seems too simple.

  2. Is there a more elegant way to feed in r0 - r2 from the input parameters in the surrounding C function? I understand from AAPCS that the registers r0-r3 are input parameters 1-4, so this seems redundant to feed r0-r2 inputs manually like I did in the input list. Should I somehow just have this be a pure assembly function in a separater .S file?

Thank you in advance.

I tried the above but with the basic inline assembly protocol with terrible results - it crashed. I did it that way because I thought the assembly block would naturally take r0-r2 via the function prologue, which it evidently did because it wrote the memory correctly, but crashed once my breakpoint at the beginning of the asm block was kicked off (my vs code extension doesn't have the step-by-step disassembly view, so it just runs it as a block box and it crashed). I haven't tried the extended yet, I have been doing a lot of reading into this so I just wanted to make sure my black box approach should work and I'm not missing anything too big.


Solution

  • Yes, a volatile asm with a "memory" clobber is fine for MMIO (or pretty much anything that's supported at all): the compiler will make sure the asm it generates has memory contents in sync with the C abstract machine before the asm statement, and will assume that any globally-reachable memory has changed after. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for a more in-depth explanation of why this matters when the pointed-to memory is C variables that you also access outside inline asm, not just MMIO registers.


    Registers

    To avoid wasted instructions, tell the compiler which registers you want inputs in, or better let the compiler pick and change the "vendor code" to use %0 instead of the hard register r0.

    ldr r0, r0 from filling in your ldr r0, %0 template string is either invalid or treats the source r0 as a symbol name. Either way doesn't get the function arg into r0, since you force the compiler to have it in a different register (by declaring a clobber on "r0".) If you did want to copy between registers, the ARM instruction for that is mov. But if that's the first instruction of an asm template string, usually that means you're doing it wrong and should use better constraints to tell the compiler what you want.

    // Worse way, but can use a template string with hard-coded registers unchanged
    
    void my_asm_ported_func(int a, int b, int c)
    {
        register int reg_r0 asm ("r0") = a;  // forces "r" to pick r0 for an asm template
        register int reg_r1 asm ("r1") = b;  // no other *guaranteed* effects.
        register int reg_r2 asm ("r2") = c;
    
        __asm__ __volatile__ (
                     // no extra mov or load instructions
            "<vendor code...>  "   // still unchanged
    
            : "+r" (reg_r0), "+r" (reg_r1), "+r" (reg_r2) // read-write outputs
            : // no pure inputs
            : "r4", "r5", "r6", 
              "r7", "r8", "r9", "memory" // clobbers
        );
    }
    

    Best way

    void my_asm_ported_func(int reg_r0, int reg_r1, int reg_r2) {
    
        __asm__ __volatile__ (
             // no extra mov or load instructions.
            "<vendor code changed to use %0 instead of r0, etc...>  "
    
            : "+r" (reg_r0), "+r" (reg_r1), "+r" (reg_r2) // read-write outputs
            : // no pure inputs
            : "r4", "r5", "r6", 
              "r7", "r8", "r9", "memory" // clobbers.  Not including r3??
        );
    
        // the C variables reg_r0 and so on have modified values here
        // but they're local to this function so no effect outside of this
    }
    

    Actually, a further improvement would be to replace the register clobbers like "r4" through "r9" with "=r"(dummy1) output operands to let the compiler pick which registers to clobber.

    I'm surprised the template string doesn't use r3. If it does, you forgot to tell the compiler about it, which is undefined behaviour that will bite you when this function inlines. You mentioned crashes; that could be the cause, if your ldr isn't.

    Using %0 instead of r0 in the "vendor code" will get the compiler to fill in the register name it picked. Normally it will pick r0 for the C variable whose value was already there, unless the function inline and the value was in a different register.

    I'm assuming the asm template modifies that register, which is why I made it an input/output operand with "+r"(reg_r0), with the output side basically being a dummy to let the compiler know that register changed. You can't declare a clobber on a register that's also an operand, and if you're letting the compiler pick registers you wouldn't even know which one.

    If any of the input registers are left unmodified by the asm template, make them pure inputs. You can use [name] "r"(c_var) in the operands and %[name] in the template string to use names instead of numbers, making it easy to move them around without having to renumber and keep track of which operand is which number.

    See also


    Separate .S file:

    Should I somehow just have this be a pure assembly function in a separate .S file?

    That's 100% a valid option, especially if call/ret overhead is minor compared to how long this takes, or it's not called all the time.

    Look at compiler-generated asm (gcc -S) if you're not sure about the syntax for declaring a function (.globl foo ; foo: to define the symbol, put its machine code after it.) And of course push and pop any call-preserved registers your function uses.

    (GNU C inline asm requires you to describe the asm precisely to the compiler; the function-calling convention is irrelevant because it's inline asm. You're dancing with the compiler and need to not step on its toes, instead of just following the standard calling convention.)