Passing structs by-value while conforming to the C calling convention in LLVM IR

I would like to pass structs by-value between C++ and a JIT'd LLVM program. I've seen a lot of discussion about this and even a few questions on SO. I've read that I need to do something called "argument coercion" if I want my program to really pass-by-value. Using byval and sret looks like the easy cross-platform solution. It's still a bit of a pain and the C++ code has to remember to pass pointers instead of values (although, the calling code is C++ so I could do some templating magic).

The more I read about this problem, the less I seem to understand it. Calling convention is a platform-specific issue that should be dealt with by the code generator, right? I don't understand why the platform-specific code generator doesn't just deal with the platform-specific way of handling structs (while conforming to the platform's C ABI). The front-end should be platform-agnostic!

Is there a pass that does argument coercion for me? A pass that visits every function declaration and every function call and transforms all of the structs so that they are compatible with the platform's C ABI? I feel like that's something that all frontends would be using if it existed and Clang doesn't use it so maybe it's not possible. Why isn't this a viable solution? If a pass can just deal with this then I would expect it to be part of LLVM.

I don't understand why every frontend has to do argument coercion. I don't even understand how to do argument coercion. I've seen a few instances of people taking the Clang code generation code and factoring out the part that does argument coercion. Unfortunately, this seems like the best solution if I want real C ABI compatibility. The fact that it's even possible to reuse part of another frontend for a completely different language makes me continue to wonder why this has to be done in the frontend?

Something has to be done about this! We can't just keep writing the same C ABI compatibility code in every frontend. It's ridiculous! Maybe I simply don't understand.

Could someone clear this up for me? I'm thinking about using byval and sret simply because it's easier than modifying the clang code generator. Is there an easier way?

Solution

When passing around structs by value in LLVM IR, you have to make up your own rules. I chose the simplest set of rules I could.

Let's say I have a program like this:

struct MyStruct {
  int a;
  char b, c, d, e;
};

MyStruct identityImpl(MyStruct s) {
  return s;
}

MyStruct identity(MyStruct s) {
  return identityImpl(s);
}

The LLVM IR for this program is equivalent to this:

void identityImpl(MyStruct *ret, const MyStruct *s) {
  MyStruct localS = *s;
  *ret = localS;
}

void identity(MyStruct *ret, const MyStruct *s) {
  MyStruct localS = *s;
  MyStruct localRet;
  identityImpl(&localRet, &localS);
  *ret = localRet;
}

It's not the most efficient way of passing the struct because MyStruct can fit in a 64-bit register. However, the optimizer can remove localS and use s directly if it can prove that localS is never written to. Both of those functions optimize down to a single call to memcpy.

This only took half a day. Going the Clang route probably would have taken at least a week. I still think it's rather unfortunate that I had to do this but I understand the problem now. The passing of structs is not specified by the platform's C ABI.