Search code examples
gcccygwinmingwx86-64calling-convention

How to make my C program compiled with SysV calling convention run under MinGW


My platform is x86_64 + Windows 10 + Cygwin. My compiler is x86_64-w64-mingw32-gcc.

For some reason, I had to compile my program with -mabi=sysv option, and I would like to avoid the default -mabi=ms option if it is possible at all.

The program compiled successfully. But when it calls library functions like printf, it segfaults. The reason is that the library functions reside in msvcrt.dll, which was probably prebuilt with a calling convention other than -mabi=sysv.

So, is there a way to install libraries compiled with -mabi=sysv in Cygwin?


Solution

  • You generally want to avoid this, e.g. by using macros in your asm to adapt it between calling conventions. Agner Fog's calling convention guide has some suggestions, e.g. #ifdef or %if some extra mov instructions ahead of the function to put args in the registers the rest of the function was written for. (And then don't use the red zone or shadow space, lowest common denominator, if your function can still work efficiently that way.)

    IDK if the same .cfi directives could create correct stack-unwind metadata for both MS and SysV versions, but if you just use your SysV assembly you're unlikely to have SEH-safe code that fully complies with the Windows ABI.


    Per-function attributes to specify calling convention

    If you declare every hand-written asm function's prototype with __attribute__((sysv_abi)), GCC will make code that calls it correctly.

    The GCC manual also says:

    Note, the ms_abi attribute for Microsoft Windows 64-bit targets currently requires the -maccumulate-outgoing-args option.

    Of course, any callback function-pointers you pass into the asm functions also need to be __attribute__((ms_abi)), so if you want to pass the address of any Windows library function, you'll need to write your own wrapper for it which uses that in the definition.


    This comes at some performance overhead, because the SysV ABI allows clobbering all XMM regs, but MS-ABI doesn't, so every function that calls any SysV-ABI function needs to save/restore xmm6-15. This also bloats those call-sites, which can be mitigated by having GCC use a helper function to save / restore those mismatched registers:

    -mcall-ms2sysv-xlogues
    Due to differences in 64-bit ABIs, any Microsoft ABI function that calls a System V ABI function must consider RSI, RDI and XMM6-15 as clobbered. By default, the code for saving and restoring these registers is emitted inline, resulting in fairly lengthy prologues and epilogues. Using -mcall-ms2sysv-xlogues emits prologues and epilogues that use stubs in the static portion of libgcc to perform these saves and restores, thus reducing function size at the cost of a few extra instructions.


    Example (on Godbolt)

    #define SYSV __attribute__((sysv_abi))
    
    SYSV int foo(int x, double);
    
    // no attribute so this function is the default MS-ABI
    int bar(int *p, double d) {
        *p = 0;                    // incoming arg in RCX
        int tmp = foo(p[12], d);
        *p = tmp;
        return tmp;
    }
    

    Build with -O3 -mabi=ms -Wall -maccumulate-outgoing-args with Linux GCC11.2. -mabi=ms is a rough simulation of building with Cygwin or MinGW.

    bar:
            push    rdi
            movapd  xmm0, xmm1                  # SysV wants first FP arg in XMM0
            push    rsi                         # save RDI/RSI, call-preserved in MS-ABI
            push    rbx
            mov     rbx, rcx
            sub     rsp, 160                     # shadow space + xmm spill space
            mov     DWORD PTR [rcx], 0          # deref pointer arg
            mov     edi, DWORD PTR [rcx+48]     # load a new 1st arg
            movaps  XMMWORD PTR [rsp], xmm6     # save the MS-abi call-preserved regs
            movaps  XMMWORD PTR [rsp+16], xmm7
            movaps  XMMWORD PTR [rsp+32], xmm8
            movaps  XMMWORD PTR [rsp+48], xmm9
            movaps  XMMWORD PTR [rsp+64], xmm10
            movaps  XMMWORD PTR [rsp+80], xmm11
            movaps  XMMWORD PTR [rsp+96], xmm12
            movaps  XMMWORD PTR [rsp+112], xmm13
            movaps  XMMWORD PTR [rsp+128], xmm14
            movaps  XMMWORD PTR [rsp+144], xmm15
            call    foo
            movaps  xmm6, XMMWORD PTR [rsp]
            movaps  xmm7, XMMWORD PTR [rsp+16]
            movaps  xmm8, XMMWORD PTR [rsp+32]
            movaps  xmm9, XMMWORD PTR [rsp+48]
            mov     DWORD PTR [rbx], eax
            movaps  xmm10, XMMWORD PTR [rsp+64]
            movaps  xmm11, XMMWORD PTR [rsp+80]
            movaps  xmm12, XMMWORD PTR [rsp+96]
            movaps  xmm13, XMMWORD PTR [rsp+112]
            movaps  xmm14, XMMWORD PTR [rsp+128]
            movaps  xmm15, XMMWORD PTR [rsp+144]
            add     rsp, 160
            pop     rbx
            pop     rsi
            pop     rdi
            ret
    

    Note that all this XMM save/restore would still happen even if there were no double or float args; the caller has to assume the callee might be using SIMD or FP internally. Fortunately only the low xmm part of high vector regs are call-preserved, even with AVX enabled, so the stack space usage doesn't get worse.

    If I'd used a __m128d arg, x64 fastcall (the default with ms_abi) would pass it by reference, pointed to by RDX. vectorcall would have passed it in XMM1 like a double (not xmm0 because arg-register numbering isn't independent between int and FP in MS-ABI). I don't know how to use x64 vectorcall with GCC. __attribute__((vectorcall)) is ignored if used on bar.

    vs. with both parts compiled as MS-ABI (commenting out the SYSV on the prototype):

    bar:
            push    rbx                        # save a call-preserved reg (and realign the stack)
            mov     rbx, rcx                     # save incoming arg around function call
            sub     rsp, 32                    # reserve shadow space
            mov     DWORD PTR [rcx], 0           # deref the arg
            mov     ecx, DWORD PTR [rcx+48]    # load a new 1st arg
                                               # 2nd arg still in XMM1
            call    foo
            mov     DWORD PTR [rbx], eax       # save return value
            add     rsp, 32
            pop     rbx
            ret                                # and return with it in EAX
    
    Clang ignores -mabi=ms, but respects __attribute__((ms_abi))

    So a Windows build of clang that defaults to MS-ABI can probably use __attribute__((sysv)).