I'm using C++builder for GUI application on Win32. Borland compiler optimization is very bad and does not know how to use SSE. I have a function that is 5 times faster when compiled with mingw gcc 4.7. I think about asking gcc to generate assembler code and then use this cod inside my C function because Borland compiler allows inline assembler.
The function in C looks like this :
void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
{
double s = 77.777;
size_t m = mA[NT-3];
AV[2]=x[n-4]+m*s;
}
I made the function code very simple in order to simplify my question. My real function contains many loops.
The Borland C++ compiler generated this assembler code :
;
; void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
;
@1:
push ebp
mov ebp,esp
add esp,-16
push ebx
;
; {
; double s = 77.777;
;
mov dword ptr [ebp-8],1580547965
mov dword ptr [ebp-4],1079210426
;
; size_t m = mA[NT-3];
;
mov edx,dword ptr [ebp+20]
mov ecx,dword ptr [ebp+24]
mov eax,dword ptr [edx+4*ecx-12]
;
; AV[2]=x[n-4]+m*s;
;
?live16385@48: ; EAX = m
xor edx,edx
mov dword ptr [ebp-16],eax
mov dword ptr [ebp-12],edx
fild qword ptr [ebp-16]
mov ecx,dword ptr [ebp+8]
mov ebx,dword ptr [ebp+12]
mov eax,dword ptr [ebp+16]
fmul qword ptr [ebp-8]
fadd qword ptr [ecx+8*ebx-32]
fstp qword ptr [eax+16]
;
; }
;
?live16385@64: ;
@2:
pop ebx
mov esp,ebp
pop ebp
ret
While the gcc generated assembler code is :
_Test_Fn:
mov edx, DWORD PTR [esp+20]
mov eax, DWORD PTR [esp+16]
mov eax, DWORD PTR [eax-12+edx*4]
mov edx, DWORD PTR [esp+8]
add eax, -2147483648
cvtsi2sd xmm0, eax
mov eax, DWORD PTR [esp+4]
addsd xmm0, QWORD PTR LC0
mulsd xmm0, QWORD PTR LC1
addsd xmm0, QWORD PTR [eax-32+edx*8]
mov eax, DWORD PTR [esp+12]
movsd QWORD PTR [eax+16], xmm0
ret
LC0:
.long 0
.long 1105199104
.align 8
LC1:
.long 1580547965
.long 1079210426
.align 8
I like to get help about how the function arguments acces is done in gcc and Borland C++. My function in C++ for Borland would be something like :
void Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
{
__asm
{
put gcc generated assembler here
}
}
Borland starts using ebp
register while gcc use esp
register.
Can I force one of the compilers to generate compatible code for accessing the arguments using some calling conventions like cdecl ou stdcall ?
The arguments are passed similarly in both cases. The difference is that the code generated by Borland expresses the argument locations relative to EBP register and GCC relative to ESP, but both of them refer to the same addresses.
Borlands sets EBP to point to the start of the function's stack frame and expresses locations relative to that, while GCC doesn't set up a new stack frame but expresses locations relative to ESP, which the caller has left pointing to the end of the caller's stack frame.
The code generated by Borland sets up a stack frame at the beginning of the function, causing EBP in the Borland code to be equal to ESP in the GCC code decreased by 4. This can be seen by looking at the first two Borland lines:
push ebp ; decrease esp by 4
mov ebp,esp ; ebp = the original esp decreased by 4
The GCC code doesn't alter ESP and Borland code doesn't alter EBP until the end of the procedure, so the relationsip holds when the arguments are accessed.
The calling convention seems to be cdecl in both of the cases, and there's no difference in how the functions are called. You can add keyword __cdecl
to both in order to make that clear.
void __cdecl Test_Fn(double *x, size_t n,double *AV, size_t *mA, size_t NT)
However adding inline assembly compiled with GCC to the function compiled with Borland is not straightforward, because Borland might set up a stack frame even if the function body contains only inline assembly, causing the value of ESP register to differ from the one used in the GCC code. I see three possible workarounds: