I've stumbled across an oddity in MSVCs codegen, regarding structures that are used as return-values. Consider the following code (live demo here):
struct Result
{
uint64_t value;
};
Result makeResult(uint64_t value)
{
return { value };
}
struct ResultFactory
{
NOINLINE Result MakeResult(uint64_t value) const
{
return { value };
}
};
We have a struct, which perfectly fullfils the x64-APIs condition for being returned in RAX. And as long as the free function is used, this is the case:
value$ = 8
Result makeResult(unsigned __int64) PROC ; makeResult, COMDAT
mov rax, rcx
ret 0
Result makeResult(unsigned __int64) ENDP ; makeResult
Now when we look at the member-function, it looks slightly different:
Result ResultFactory::MakeResult(unsigned __int64)const PROC ; ResultFactory::MakeResult, COMDAT
mov QWORD PTR [rdx], r8
mov rax, rdx
ret 0
Result ResultFactory::MakeResult(unsigned __int64)const ENDP ; ResultFactory::MakeResult
Here, the compiler decided to require "Result" to have a reference passed in the first register (well, RDX/second, as that's what MSVC does for member-functions in the first place when RAX cannot be returned).
Why would that be the case? Is there any good reason for that? It seems needlessly pessimising code-gen, and I really see no benefit to it. Having "RCX" always be this kind of makes sense, but always requiring a reference, even for primitive structs? This also means that there is unfortunately a very real difference between using a member-function and a free function, as long as neigther can be inlined. Or in case where a member-function is used, you it could be faster to just return a primitive type and bit_cast it across the function boundary (whether or not that all matters is another question, but it shouldn't be the case frankly).
Clang/GCC seem to do it "right". I'm not 100% sure if this is just a MSVC quirk, or actually the x64-windows calling convention (MSDN doesn't really say anything about c++ specifically). Anyone got a clue what's going on here?
EDIT: As pointed out by @Turtlefight, this is indeed mandated by the Windows-ABI. My follow-up, or rewording of this question would then be - why does the windows-ABI make this distinction, when it seems to only lead to worse code-gen, plus actually makes handling global and member-functions be wastly different and thus more complex. In case anyone would know why it was designed that way.
This is required by the Windows x64 ABI.
Non-static member functions cannot return user-defined types by value.
Only static member functions and global functions can return user-defined types by value.
x64 calling convention - return values
Return Values
User-defined types can be returned by value from global functions and static member functions. To return a user-defined type by value in RAX, it must have a length of 1, 2, 4, 8, 16, 32, or 64 bits. It must also have no user-defined constructor, destructor, or copy assignment operator. It can have no private or protected non-static data members, and no non-static data members of reference type. It can't have base classes or virtual functions. And, it can only have data members that also meet these requirements. (This definition is essentially the same as a C++03 POD type. Because the definition has changed in the C++11 standard, we don't recommend using std::is_pod for this test.)
Otherwise, the caller must allocate memory for the return value and pass a pointer to it as the first argument. The remaining arguments are then shifted one argument to the right. The same pointer must be returned by the callee in RAX.
clang will also generate the same code if you ask it to compile for the microsoft x64 abi:
(with -target x86_64-pc-windows-msvc -fc++-abi=microsoft
)
"?MakeResult@ResultFactory@@QEBA?AUResult@@_K@Z":
mov rax, rdx
mov qword ptr [rdx], r8
ret