Search code examples
llvmllvm-irpass-by-valuecalling-convention

Passing structs by-value in LLVM IR


I'm generating LLVM IR for JIT purposes, and I notice that LLVM's calling conventions don't seem to match the C calling conventions when aggregate values are involved. For instance, when I declare a function as taking a {i32, i32} (that is, a struct {int a, b;} in C terms) parameter, it appears to pass each of the struct elements in its own x86-64 GPR to the function, even though the x86-64 ABI specifies (sec. 3.2.3) that such a struct should be packed in a single 64-bit GPR.

This is in spite of LLVM's documentation claiming to match the C calling convention by default:

ccc” - The C calling convention

This calling convention (the default if no other calling convention is specified) matches the target C calling conventions. This calling convention supports varargs function calls and tolerates some mismatch in the declared prototype and implemented declaration of the function (as does normal C).

My question, then, is: Am I doing something wrong to cause LLVM to not match the C calling convention, or is this known behavior? (At the very least, the documentation seems to be wrong, no?)

I can find only very few references to the issue at all on the web, such as this bug report from 2007, which claims to be fixed. It also claims that "First, LLVM has no way to deal with aggregates as singular Value*'s", which I don't know if it was true in 2007, but it doesn't seem to be true now, given the extractvalue/insertvalue instructions. I also found this SO question whose second (non-accepted) answer simply seems to accept implicitly that argument coercion has to be done manually.

I'm currently building code for doing argument coercion in my IR generator, but it is complicating my design considerably (not to mention making it architecture-specific), so if I'm simply doing something wrong, I'd rather know about that. :)


Solution

  • LLVM's support for C-language compatible calling convention is extremely limited I'm afraid. Several folks have wished for more direct calling convention support in LLVM (or a related library), but so far this has not emerged. That logic is currently encoded in the C-language frontend (Clang for example).

    What LLVM provides is a mapping from specific LLVM IR types to specific C ABI lowerings for a specific CPU backend. You can see which IR types to use for a given C function by using Clang to emit LLVM IR, much as the comment above suggests: https://c.compiler-explorer.com/z/8jWExWPYq

    struct S { int x, y; };
    
    void f(struct S s);
    
    void test(int x, int y) {
        struct S s = {x, y};
        f(s);
    }
    

    Turns into:

    define dso_local void @test(i32 noundef %0, i32 noundef %1) #0 {
      %3 = alloca i32, align 4
      %4 = alloca i32, align 4
      %5 = alloca %struct.S, align 4
      store i32 %0, ptr %3, align 4
      store i32 %1, ptr %4, align 4
      %6 = getelementptr inbounds %struct.S, ptr %5, i32 0, i32 0
      %7 = load i32, ptr %3, align 4
      store i32 %7, ptr %6, align 4
      %8 = getelementptr inbounds %struct.S, ptr %5, i32 0, i32 1
      %9 = load i32, ptr %4, align 4
      store i32 %9, ptr %8, align 4
      %10 = load i64, ptr %5, align 4
      call void @f(i64 %10)
      ret void
    }
    
    declare void @f(i64) #1
    

    There is sadly some non-trivial logic to map specific C types into the LLVM IR that will match the ABI when lowered for a platform. Outside of extremely simple types (basic C integer types, pointers, float, double, maybe a few others), these aren't even portable between the different architecture ABIs/calling-conventions.

    FWIW, the situation is even worse for C++ which has much more complexity here I'm afraid.

    So your choices are to:

    1. Use a very small set of types in a limited range of signatures that you build custom logic to lower correctly into LLVM IR, checking that it matches what Clang (or another C frontend) produces in every case.
    2. Directly use Clang or another C frontend to emit the LLVM IR.
    3. Take on the major project of extracting this ABI/calling-convention logic from Clang into a re-usable library. There has in the past been appetite for this in the LLVM/Clang communities, but it is a very large and complex undertaking from my understanding. There are some partial efforts (specifically for C and JITs) that you may be able to find and re-use, but I don't have a good memory of where all those are.