assembly gcc arm inline-assembly semihosting

What's the point of providing input and output operands if they are not specified in ASM template?

I found the following piece of code in u-boot/arch/arm/lib/semihosting.c that uses bkpt and other instructions and provides input and output operands even though they are not specified in the ASM template:

static noinline long smh_trap(unsigned int sysnum, void *addr)
{
    register long result asm("r0");
#if defined(CONFIG_ARM64)
    asm volatile ("hlt #0xf000" : "=r" (result) : "0"(sysnum), "r"(addr));
#elif defined(CONFIG_CPU_V7M)
    asm volatile ("bkpt #0xAB" : "=r" (result) : "0"(sysnum), "r"(addr));
#else
    /* Note - untested placeholder */
    asm volatile ("svc #0x123456" : "=r" (result) : "0"(sysnum), "r"(addr));
#endif
    return result;
}

Minimal, verifiable example:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
  register long result asm("r0");
  void *addr = 0;
  unsigned int sysnum = 0;
  __asm__ volatile ("bkpt #0xAB" : "=r" (result) : "0"(sysnum), "r"(addr));

  return EXIT_SUCCESS;
}

According to ARM Architecture Reference Manual bkpt instruction takes a single imm parameter and according to my reading of GCC manual section on inline assembly GCC does not allow providing operands if they are not specified in the template. Output assembly generated with -S:

    .arch armv6
    .eabi_attribute 28, 1
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 2
    .eabi_attribute 30, 6
    .eabi_attribute 34, 1
    .eabi_attribute 18, 4
    .file   "bkpt-so.c"
    .text
    .align  2
    .global main
    .arch armv6
    .syntax unified
    .arm
    .fpu vfp
    .type   main, %function
main:
    @ args = 0, pretend = 0, frame = 8
    @ frame_needed = 1, uses_anonymous_args = 0
    @ link register save eliminated.
    str fp, [sp, #-4]!
    add fp, sp, #0
    sub sp, sp, #12
    mov r3, #0
    str r3, [fp, #-8]
    mov r3, #0
    str r3, [fp, #-12]
    ldr r2, [fp, #-12]
    ldr r3, [fp, #-8]
    mov r0, r2
    .syntax divided
@ 10 "bkpt-so.c" 1
    bkpt #0xAB
@ 0 "" 2
    .arm
    .syntax unified
    mov r3, #0
    mov r0, r3
    add sp, fp, #0
    @ sp needed
    ldr fp, [sp], #4
    bx  lr
    .size   main, .-main
    .ident  "GCC: (Raspbian 8.3.0-6+rpi1) 8.3.0"
    .section    .note.GNU-stack,"",%progbits

So what's the point of "=r" (result) : "0"(sysnum), "r"(addr) in this line:

__asm__ volatile ("bkpt #0xAB" : "=r" (result) : "0"(sysnum), "r"(addr));

Solution

Despite the fact that this code exists in a well known project like U-BOOT does not instill confidence. The code is relying on the fact that with the ARM architectures that the ABI (call standard) passes the first 4 scalar arguments in r0 (argument 1), r1 (argument 2), r2 (argument 3), and r3 (argument 4).

Table 6.1 summarizes the ABI:

The assumption that the U-BOOT code is making is that addr which was passed to the function in r1 is still the same value when the inline assembly is generated. I consider this dangerous because even with a simple non-inlined function GCC doesn't guarantee this behaviour. My view is that this code is fragile although it probably has never presented a problem but in theory it could. Relying on underlying compiler code generation behaviour is not a good idea.

I believe it would have been better written as:

static noinline long smh_trap(unsigned int sysnum, void *addr)
{
    register long result asm("r0");
    register void *reg_r1 asm("r1") = addr;
#if defined(CONFIG_ARM64)
    asm volatile ("hlt #0xf000" : "=r" (result) : "0"(sysnum), "r"(reg_r1) : "memory");
#elif defined(CONFIG_CPU_V7M)
    asm volatile ("bkpt #0xAB" : "=r" (result) : "0"(sysnum), "r"(reg_r1) : "memory");
#else
    /* Note - untested placeholder */
    asm volatile ("svc #0x123456" : "=r" (result) : "0"(sysnum), "r"(reg_r1) : "memory");
#endif
    return result;
}

This code passes addr through a variable (reg_r1) that will be put into register r1 for the purposes of an inline assembly constraint. On higher optimizations levels the compiler would not generate any extra code with the extra variable. I have also placed a memory clobber because it is not a good idea to pass a memory address through a register in this way without one. This poses an issue if someone were to make an inlined version of this function. The memory clobber will ensure that any data is realized into memory before the inline assembly is run and if necessary reloaded when necessary afterwards.

As for the question about what "=r" (result) : "0"(sysnum), "r"(addr) does is:

"=r"(result) is an output constraint that tells compiler that the value in register r0 after the inline assembly completes will be placed in variable addr
"0"(sysnum) is an input constraint that tells compiler that sysnum will be passed into the inline assembly code through the same register as constraint 0 (constraint 0 is using register r0).
"r"(addr) passes addr through a register and the assumption is that it will be in r1 with the U-BOOT code. In my version it is explicitly defined that way.

Information on operands and constraints for extended inline assembly can be found in the GCC documentation. You can find additional machine specific constraints here.

hlt, bkpt, and svc are all being used as system calls to have a system service performed through the debugger (semihosting). You can find more documentation on semihosting here. The different ARM architectures use a slightly different mechanism. The convention for a semihosting system call is that r0 contains the system call number; r1 contains the first argument of the system call; the system call places a return value in r0 before returning to user code.