Search code examples
llvmllvm-irllvm-codegen

x86_64 incorrect calling convention when calling function


I'm relatively new to LLVM, and I'm attempting to generate LLVM IR that calls a C function (growDictionary). This is on x86_64 Linux, using llvm 12:

$ llc-12 --version
Ubuntu LLVM version 12.0.1

  Optimized build.
  Default target: x86_64-pc-linux-gnu
  Host CPU: broadwell

The function (defined in C++ as extern "C", compiled with clang 12):

struct StringDictionary {
    uint32_t* base;
    uint32_t elementSize;
    uint32_t rowCount;
    uint32_t wordsCapacity;
};

extern "C" {
StringDictionary growStringDictionary(StringDictionary dict,
                                      uint32_t neededWordsCapacity);
}

The function takes the StringDictionary object by value, but, according to the x86_64 ABI (https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf, section 3.2.3, "Parameter Passing") should have it passed on the stack. (The object's size is greater than 2 eightbytes and neither of the eightbytes is in class SSE or SSEUP, so it turns into class MEMORY according to the "post merger cleanup" section.) A cursory look at the disassembly confirms that this is indeed the case:

Dump of assembler code for function growStringDictionary(rockset::jit::StringDictionary, uint32_t):
   0x00007ffff7f98f70 <+0>: push   %rbp
   0x00007ffff7f98f71 <+1>: mov    %rsp,%rbp
   0x00007ffff7f98f74 <+4>: push   %rbx
   0x00007ffff7f98f75 <+5>: and    $0xffffffffffffffe0,%rsp
   0x00007ffff7f98f79 <+9>: sub    $0x1c0,%rsp
   0x00007ffff7f98f80 <+16>:    mov    %rsp,%rbx
   0x00007ffff7f98f83 <+19>:    mov    %esi,0x15c(%rbx)
   0x00007ffff7f98f89 <+25>:    mov    %rdi,0x160(%rbx)
[...]

%rdi is the address where the return value will be written, %esi is the uint32_t neededWordsCapacity argument, no other argument passing registers are used.

This is all fine so far, but I'm now trying to call this function from my generated IR, and it tries to pass all arguments in registers. Here are the relevant sections of code:

  %83 = call { i32*, i32, i32, i32 } @growStringDictionary({ i32*, i32, i32, i32 } %70, i32 %73)
[...]
declare { i32*, i32, i32, i32 } @growStringDictionary({ i32*, i32, i32, i32 }, i32)

Note that the calling convention is default (not changed to something like fastcc).

The generated code (both the JIT I'm trying to use and llc produce the same result) os trying to pass the argument in registers, here's the output from llc -O0; -O3 is similar:

        movl    148(%rsp), %r9d                 # 4-byte Reload
        movl    140(%rsp), %r8d                 # 4-byte Reload
        movl    136(%rsp), %ecx                 # 4-byte Reload
        movl    132(%rsp), %edx                 # 4-byte Reload
        movq    120(%rsp), %rsi                 # 8-byte Reload
        leaq    376(%rsp), %rdi
        callq   growStringDictionary@PLT

Unsurprisingly, my code segfaults.

I'm surprised that llc generated code that doesn't match the ABI. Are there any attributes I need to put on the function declaration, or on the type definition, or is there anything else that I'm missing?


Solution

  • It turns out that this part of the calling convention is handled by the frontend (together with, I presume, things like "this is a non-trivial C++ object").

    Take this example file:

    #include <stdint.h>
    
    struct A {
      uint32_t* p;
      uint32_t a;
      uint32_t b;
    };
    
    struct B {
      uint32_t* p;
      uint32_t a;
      uint32_t b;
      uint32_t c;
    };
    
    uint32_t addA(struct A x) {
      return x.a + x.b;
    }
    
    uint32_t addB(struct B x) {
      return x.a + x.b + x.c;
    }
    

    clang -S -emit-llvm says:

    %struct.A = type { i32*, i32, i32 }
    %struct.B = type { i32*, i32, i32, i32 }
    
    ; Function Attrs: noinline nounwind optnone uwtable
    define dso_local i32 @addA(i32* %0, i64 %1) #0 {
      %3 = alloca %struct.A, align 8
      %4 = bitcast %struct.A* %3 to { i32*, i64 }*
      %5 = getelementptr inbounds { i32*, i64 }, { i32*, i64 }* %4, i32 0, i32 0
      store i32* %0, i32** %5, align 8
      %6 = getelementptr inbounds { i32*, i64 }, { i32*, i64 }* %4, i32 0, i32 1
      store i64 %1, i64* %6, align 8
      %7 = getelementptr inbounds %struct.A, %struct.A* %3, i32 0, i32 1
      %8 = load i32, i32* %7, align 8
      %9 = getelementptr inbounds %struct.A, %struct.A* %3, i32 0, i32 2
      %10 = load i32, i32* %9, align 4
      %11 = add i32 %8, %10
      ret i32 %11
    }
    
    ; Function Attrs: noinline nounwind optnone uwtable
    define dso_local i32 @addB(%struct.B* byval(%struct.B) align 8 %0) #0 {
      %2 = getelementptr inbounds %struct.B, %struct.B* %0, i32 0, i32 1
      %3 = load i32, i32* %2, align 8
      %4 = getelementptr inbounds %struct.B, %struct.B* %0, i32 0, i32 2
      %5 = load i32, i32* %4, align 4
      %6 = add i32 %3, %5
      %7 = getelementptr inbounds %struct.B, %struct.B* %0, i32 0, i32 3
      %8 = load i32, i32* %7, align 8
      %9 = add i32 %6, %8
      ret i32 %9
    }
    

    Note that the argument to addB has become %struct.B* byval(%struct.B) indicating that this is passed on the stack.