Search code examples
c++assemblyinline-assembly32-bitpowerpc

Properly gathering return value(s) from inline assembly


I want to use inline assembly to execute a syscall on a PowerPC 32-bit architecture. After performing the syscall, I also want to return the return values of the syscall by taking the values of r3 and r4 and putting them into a long. My function looks as follows:

constexpr auto maximum_syscall_parameter_count = 8;

long execute_system_call_with_arguments(short value, const int parameters_array[maximum_syscall_parameter_count]) {    
    char return_value_buffer[sizeof(long)];

    // syscall value
    asm volatile("mr 0, %0" : : "r" (value));

    // Pass the parameters
    asm volatile("mr 3, %0" : : "r" (parameters_array[0]));
    asm volatile("mr 4, %0" : : "r" (parameters_array[1]));
    asm volatile("mr 5, %0" : : "r" (parameters_array[2]));
    asm volatile("mr 6, %0" : : "r" (parameters_array[3]));
    asm volatile("mr 7, %0" : : "r" (parameters_array[4]));
    asm volatile("mr 8, %0" : : "r" (parameters_array[5]));
    asm volatile("mr 9, %0" : : "r" (parameters_array[6]));
    asm volatile("mr 10, %0" : : "r" (parameters_array[7]));

    // Execute the syscall
    asm volatile ("sc");

    // Retrieve the return value
    asm volatile ("mr %0, 3" : "=r" (*(int *) &return_value_buffer));
    asm volatile ("mr %0, 4" : "=r" (*(int *) &return_value_buffer[sizeof(int)]));

    return *(long *) &return_value_buffer;
}

This seems to generate correct code but it feels hacky, there are 2 redundant instructions generated:

mr        r0, r30
lwz       r9, 0(r31)
mr        r3, r9
lwz       r9, 4(r31)
mr        r4, r9
lwz       r9, 8(r31)
mr        r5, r9
lwz       r9, 0xC(r31)
mr        r6, r9
lwz       r9, 0x10(r31)
mr        r7, r9
lwz       r9, 0x14(r31)
mr        r8, r9
lwz       r9, 0x18(r31)
mr        r9, r9
lwz       r9, 0x1C(r31)
mr        r10, r9
sc
mr        r3, r3 # Redundant
mr        r9, r4 # Redundant
blr

My goal is to simply return with r3 and r4 set by the sc instruction but removing the return value or the last 2 inline assembly instructions from the source code will corrupt the function to either crash on return or return 0.


Solution

  • Let me start by re-iterating what I said above: I don't speak PPC asm, and I don't have a PPC to run this code on. So while I believe that generally this is the direction you should proceed, don't take this code as gospel.

    Next, the reason both Jester and I suggested using local register variables is that it results in better (and arguably more readable/maintainable) code. The reason for that is this line in the gcc docs:

    GCC does not parse the assembler instructions themselves and does not know what they mean or even whether they are valid assembler input.

    With that in mind, what happens when you use code like you have above, and call the routine with code like:

    int parameters_array[maximum_syscall_parameter_count] = {1, 2, 3, 4, 5, 6, 7};
    
    long a = execute_system_call_with_arguments(9, parameters_array);
    

    Since the compiler doesn't know what's going to happen inside that asm block, it must write everything to memory, which the asm block then reads back from memory into registers. While using code like below, the compiler can be smart enough to skip ever allocating the memory and load the registers directly. This can be even more useful if you are calling execute_system_call_with_arguments more than once with (essentially) the same parameters.

    constexpr auto maximum_syscall_parameter_count = 7;
    
    long execute_system_call_with_arguments(const int value, const int parameters_array[maximum_syscall_parameter_count]) {    
        int return_value_buffer[2];
    
        register int foo0 asm("0") = value;
    
        register int foo1 asm("3") = parameters_array[0];
        register int foo2 asm("4") = parameters_array[1];
        register int foo3 asm("5") = parameters_array[2];
        register int foo4 asm("6") = parameters_array[3];
        register int foo5 asm("7") = parameters_array[4];
        register int foo6 asm("8") = parameters_array[5];
        register int foo7 asm("9") = parameters_array[6];
    
        // Execute the syscall
        asm volatile ("sc"
        : "+r"(foo3), "+r"(foo4)
        : "r"(foo0), "r"(foo1), "r"(foo2), "r"(foo5), "r"(foo6), "r"(foo7)
        );
    
        return_value_buffer[0] = foo3;
        return_value_buffer[1] = foo4;
    
        return *(long *) &return_value_buffer;
    }
    

    When called with the example above produces:

    .L.main:
        li 0,9
        li 3,1
        li 4,2
        li 5,3
        li 6,4
        li 7,5
        li 8,6
        li 9,7
        sc
        extsw 3,6
        blr
    

    Keeping as much code as possible outside the asm template (constraints are considered "outside") allows gcc's optimizers to do all sorts of useful things.

    A few other points:

    1. If any of the items in parameters_array are (or might be) pointers, you're going to need to add the memory clobber. This ensures that any values that might be stored in registers get flushed to memory before executing the asm instruction. Adding the memory clobber if it's not needed (might) slow down the execution by a couple of instructions. Omitting it if needed could result in reading incorrect data.
    2. If sc modifies any registers that aren't listed here, you must list them as clobbers. And if any registers that ARE listed here (other than foo3 & foo4) change, you must make them input+outputs as well (does sc put a return code in foo0?). Even if you "don't use them" after the asm call, if they change, you HAVE to inform the compiler. As the gcc docs explicitly warn:

    Do not modify the contents of input-only operands (except for inputs tied to outputs). The compiler assumes that on exit from the asm statement these operands contain the same values as they had before executing the statement.

    Failure to heed this warning can result in code that seems to work fine one day, then suddenly causes bizarre failures at some point after (sometimes well after) the asm block. This "works and then suddenly doesn't" is one of the reasons I suggest that you don't use inline asm, but if you must (which you kinda do if you need to call sc directly), keep it as tiny as you can.

    1. I cheated a bit by changing maximum_syscall_parameter_count to 7. Apparently godbolt's gcc doesn't optimize this code as well with more parameters. There might be ways around this if that's necessary, but you'll want a better PPC expert than me to define it.