I'm trying to call a golang function from my C code. Golang does not use the standard x86_64 calling convention, so I have to resort to implementing the transition myself. As gcc does not want to mix cdecl with the x86_64 convention, I'm trying to call the function using inline assembly:
void go_func(struct go_String filename, void* key, int error){
void* f_address = (void*)SAVEECDSA;
asm volatile(" sub rsp, 0xe0; \t\n\
mov [rsp+0xe0], rbp; \t\n\
mov [rsp], %0; \t\n\
mov [rsp+0x8], %1; \t\n\
mov [rsp+0x18], %2; \t\n\
call %3; \t\n\
mov rbp, [rsp+0xe0]; \t\n\
add rsp, 0xe0;"
:
: "g"(filename.str), "g"(filename.len), "g"(key), "g"(f_address)
: );
return;
}
Sadly the compiler always throws an error at me that I dont understand:
./code.c:241: Error: too many memory references for `mov'
This corresponds to this line: mov [rsp+0x18], %2; \t\n\
If I delete it, the compilation works. I don't understand what my mistake is...
I'm compiling with the -masm=intel flag so I use Intel syntax. Can someone please help me?
A "g"
constraint allows the compiler to pick memory or register, so obviously you'll end up with mov mem,mem
if that happens. mov
can have at most 1 memory operand. (Like all x86 instructions, at most one explicit memory operand is possible.)
Use "ri"
constraints for the inputs that will be moved to a memory destination, to allow register or immediate but not memory.
Also, you're modifying RSP so you can't safely use memory source operands. The compiler is going to assume it can use addressing modes like [rsp+16]
or [rsp-4]
. So you can't use push
instead of mov
.
You also need to declare clobbers on all the call-clobbered registers, because the function call will do that. (Or better, maybe ask for the inputs in those call-clobbered registers so the compiler doesn't have to bounce them through call-preserved regs like RBX. But you need to make those operands read/write or declare separate output operands for the same registers to let the compiler know they'll be modified.)
So probably your best bet for efficiency is something like
int ecx, edx, edi, esi; // dummy outputs as clobbers
register int r8 asm("r8d"); // for all the call-clobbered regs in the calling convention
register int r9 asm("r9d");
register int r10 asm("r10d");
register int r11 asm("r11d");
// These are the regs for x86-64 System V.
// **I don't know what Go actually clobbers.**
asm("sub rsp, 0xe0\n\t" // adjust as necessary to align the stack before a call
// "push args in reverse order"
"push %[fn_len] \n\t"
"push %[fn_str] \n\t"
"call \n\t"
"add rsp, 0xe0 + 3*8 \n\t" // pop red-zone skip space + pushed args
// real output in RAX, and dummy outputs in call-clobbered regs
: "=a"(retval), "=c"(ecx), "=d"(edx), "=D"(edi), "=S"(esi), "=r"(r8), "=r"(r9), "=r"(r10), "=r"(r11)
: [fn_str] "ri" (filename.str), [fn_len] "ri" (filename.len), etc. // inputs can use the same regs as dummy outputs
: "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7", // All vector regs are call-clobbered
"xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
"memory" // if you're passing any pointers (even read-only), or the function accesses any globals,
// best to make this a compiler memory barrier
);
Notice that the output are not early-clobber, so the compiler can (at its option) use those registers for inputs, but we're not forcing it so the compiler is still free to use some other register or an immediate.
Upon further discussion, Go functions don't clobber RBP, so there's no reason to save/restore it manually. The only reason you might have wanted to is that locals might use RBP-relative addressing modes, and older GCC made it an error to declare a clobber on RBP when compiling without -fomit-frame-pointer
. (I think. Or maybe I'm thinking of EBX in 32-bit PIC code.)
Also, if you're using the x86-64 System V ABI, beware that inline asm must not clobber the red-zone. The compiler assumes that doesn't happen and there's no way to declare a clobber on the red zone or even set -mno-redzone
on a per-function basis. So you probably need to sub rsp, 128 + 0xe0
. Or 0xe0
already includes enough space to skip the red-zone if that's not part of the callee's args.