Lots of discussions here about when RVO can be done but not much about when it is actually done. As stated may times, RVO can not be guaranteed according to the Standard but is there a way to guarantee that either RVO optimization succeeds or the corresponding code fails to compile?
So far I partially succeeded to make the code issue link errors when RVO fails. For this I declare the copy constructors without defining them. Obviously this is neither robust nor feasible in the non rare cases where I need to implement one or both copy constructors, i.e. x(x&&)
and x(x const&)
.
This brings me to my second question: Why have the compiler writers chosen to enable RVO when user defined copy constructors are in place but not when only default copy constructors are present?
Third question: Is there some other way to enable RVO for plain data structures?
Last question (promise): Do you know any compiler that makes my test code behave other then I observed with gcc and clang?
Here is some example code for gcc 4.6, gcc 4.8 and clang 3.3 that shows the problem. The behavior does not depend on general optimization or debug settings. Of course option --no-elide-constructors
does what it says, i.e. turns RVO off.
#include <iostream>
using namespace std;
struct x
{
x () { cout << "original x address" << this << endl; }
};
x make_x ()
{
return x();
}
struct y
{
y () { cout << "original y address" << this << endl; }
// Any of the next two constructors will enable RVO even if only
// declared but not defined. Default constructors will not do!
y(y const & rhs);
y(y && rhs);
};
y make_y ()
{
return y();
}
int main ()
{
auto x1 = make_x();
cout << "copy of x address" << &x1 << endl;
auto y1 = make_y();
cout << "copy of y address" << &y1 << endl;
}
Output:
original x address0x7fff8ef01dff
copy of x address0x7fff8ef01e2e
original y address0x7fff8ef01e2f
copy of y address0x7fff8ef01e2f
RVO seems also not to work with plain data structures:
#include <iostream>
using namespace std;
struct x
{
int a;
};
x make_x ()
{
x tmp;
cout << "original x address" << &tmp << endl;
return tmp;
}
int main ()
{
auto x1 = make_x();
cout << "copy of x address" << &x1 << endl;
}
Output:
original x address0x7fffe7bb2320
copy of x address0x7fffe7bb2350
UPDATE: Note that some optimizations are very easily confused with RVO. Constructor helpers like make_x
are an example. See this example where the optimization is actually enforced by the standard.
The problem is that the compiler is doing too much optimizations :)
First of all, I disabled the inlining of make_x()
otherwise we cannot distinguish between RVO and inlining. However, I did put the rest into an anonymous namespace so that external linkage is not interfering with any other compiler optimizations. (As evidence shows, external linkage can prevent inlining for example, and who knows what else...) I rewrote the input-output, now it uses printf()
; otherwise the generated assembly code would be cluttered due to all the iostream
stuff. So the code:
#include <cstdio>
using namespace std;
namespace {
struct x {
//int dummy[1024];
x() { printf("original x address %p\n", this); }
};
__attribute__((noinline)) x make_x() {
return x();
}
} // namespace
int main() {
auto x1 = make_x();
printf("copy of x address %p\n", &x1);
}
I analyzed the generated assembly code with a colleague of mine as my understanding of the gcc generated assembly is very limited. Later today, I used clang with the -S -emit-llvm
flags to generate LLVM assembly which I personally find much nicer and easier to read than the X86 Assembly/GAS Syntax. It didn't matter which compiler was used, the conclusions are the same.
I rewrote the generated assembly in C++, it roughly looks like this if x
is empty:
#include <cstdio>
using namespace std;
struct x { };
void make_x() {
x tmp;
printf("original x address %p\n", &tmp);
}
int main() {
x x1;
make_x();
printf("copy of x address %p\n", &x1);
}
If x
is big (the int dummy[1024];
member uncommented):
#include <cstdio>
using namespace std;
struct x { int dummy[1024]; };
void make_x(x* x1) {
printf("original x address %p\n", x1);
}
int main() {
x x1;
make_x(&x1);
printf("copy of x address %p\n", &x1);
}
It turns out that make_x()
only has to print some valid, unique address if the object is empty. make_x()
has the liberty to print some valid address pointing to its own stack if the object is empty. There is also nothing to be copied, there is nothing to return from make_x()
.
If you make the object bigger (add the int dummy[1024];
member for example), it gets constructed in place so RVO does kick in, and only the objects' address is passed to make_x()
to be printed. No object gets copied, nothing gets moved.
If the object is empty, the compiler can decide not to pass an address to make_x()
(What a waste of resources would that be? :) ) but let make_x()
make up a unique, valid address from its own stack. When this optimization happens is somewhat fuzzy and hard to reason about (that is what you see with y
) but it really doesn't matter.
RVO looks like to happen consistently in those cases where it matters. And, as my earlier confusion shows, even the whole make_x()
function can get inlined so there is no return value to be optimized away in the first place.