gcc assembly x86 c++builder calling-convention

Use GCC generated assembler inside C++ Builder

I'm using C++builder for GUI application on Win32. Borland compiler optimization is very bad and does not know how to use SSE. I have a function that is 5 times faster when compiled with mingw gcc 4.7. I think about asking gcc to generate assembler code and then use this cod inside my C function because Borland compiler allows inline assembler.

The function in C looks like this :

void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
{
double s = 77.777;
size_t m = mA[NT-3];
AV[2]=x[n-4]+m*s;
}

I made the function code very simple in order to simplify my question. My real function contains many loops.

The Borland C++ compiler generated this assembler code :

  ;
  ; void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
  ;
  @1:
push      ebp
mov       ebp,esp
add       esp,-16
push      ebx
 ;
 ;  {
 ;      double s = 77.777;
 ;
mov       dword ptr [ebp-8],1580547965
mov       dword ptr [ebp-4],1079210426
 ;
 ;      size_t m = mA[NT-3];
 ;
mov       edx,dword ptr [ebp+20]
mov       ecx,dword ptr [ebp+24]
mov       eax,dword ptr [edx+4*ecx-12]
 ;
 ;      AV[2]=x[n-4]+m*s;
 ;
 ?live16385@48: ; EAX = m
xor       edx,edx
mov       dword ptr [ebp-16],eax
mov       dword ptr [ebp-12],edx
fild      qword ptr [ebp-16]
mov       ecx,dword ptr [ebp+8]
mov       ebx,dword ptr [ebp+12]
mov       eax,dword ptr [ebp+16]
fmul      qword ptr [ebp-8]
fadd      qword ptr [ecx+8*ebx-32]
fstp      qword ptr [eax+16]
 ;
 ;  }
 ;
 ?live16385@64: ;
 @2:
pop       ebx
mov       esp,ebp
pop       ebp
ret

While the gcc generated assembler code is :

 _Test_Fn:
mov edx, DWORD PTR [esp+20]
mov eax, DWORD PTR [esp+16]
mov eax, DWORD PTR [eax-12+edx*4]
mov edx, DWORD PTR [esp+8]
add eax, -2147483648
cvtsi2sd    xmm0, eax
mov eax, DWORD PTR [esp+4]
addsd   xmm0, QWORD PTR LC0
mulsd   xmm0, QWORD PTR LC1
addsd   xmm0, QWORD PTR [eax-32+edx*8]
mov eax, DWORD PTR [esp+12]
movsd   QWORD PTR [eax+16], xmm0
ret
 LC0:
   .long    0
   .long    1105199104
   .align 8
 LC1:
   .long    1580547965
   .long    1079210426
   .align 8

I like to get help about how the function arguments acces is done in gcc and Borland C++. My function in C++ for Borland would be something like :

 void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
 {
__asm
  {
  put gcc generated assembler here
  }
 }

Borland starts using ebp register while gcc use esp register. Can I force one of the compilers to generate compatible code for accessing the arguments using some calling conventions like cdecl ou stdcall ?

Solution

The arguments are passed similarly in both cases. The difference is that the code generated by Borland expresses the argument locations relative to EBP register and GCC relative to ESP, but both of them refer to the same addresses.

Borlands sets EBP to point to the start of the function's stack frame and expresses locations relative to that, while GCC doesn't set up a new stack frame but expresses locations relative to ESP, which the caller has left pointing to the end of the caller's stack frame.

The code generated by Borland sets up a stack frame at the beginning of the function, causing EBP in the Borland code to be equal to ESP in the GCC code decreased by 4. This can be seen by looking at the first two Borland lines:

push      ebp     ; decrease esp by 4
mov       ebp,esp ; ebp = the original esp decreased by 4

The GCC code doesn't alter ESP and Borland code doesn't alter EBP until the end of the procedure, so the relationsip holds when the arguments are accessed.

The calling convention seems to be cdecl in both of the cases, and there's no difference in how the functions are called. You can add keyword __cdecl to both in order to make that clear.

 void __cdecl Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)

However adding inline assembly compiled with GCC to the function compiled with Borland is not straightforward, because Borland might set up a stack frame even if the function body contains only inline assembly, causing the value of ESP register to differ from the one used in the GCC code. I see three possible workarounds:

Compile with Borland without the option "Standard stack frames". If the compiler figures out that a stack frame is not needed, this might work.
Compile with GCC without the option -fomit-frame-pointer. This should make sure that atleast the value of EBP is the same in both. The option is enabled at levels -O, -O2, -O3 and -Os.
Manually edit the assembly produced by GCC, changing references to ESP to EBP and adding 4 to the offset.