So my understanding of move semantics is that they allow you to override functions for use with temporary values (rvalues) and avoid potentially expensive copies (by moving the state from an unnamed temporary into your named lvalue).
My question is why do we need special semantics for this? Why couldn't a C++98 compiler elide these copies, since it's the compiler that determines whether a given expression is an lvalue or an rvalue? As an example:
void func(const std::string& s) {
// Do something with s
}
int main() {
func(std::string("abc") + std::string("def"));
}
Even without C++11's move semantics, the compiler should still be able to determine that the expression passed to func()
is an rvalue, and thus the copy from a temporary object is unnecessary. So why have the distinction at all? It seems like this application of move semantics is essentially a variant of copy elision or other similar compiler optimizations.
As another example, why bother having code like the following?
void func(const std::string& s) {
// Do something with lvalue string
}
void func(std::string&& s) {
// Do something with rvalue string
}
int main() {
std::string s("abc");
// Presumably calls func(const std::string&) overload
func(s);
// Presumably calls func(std::string&&) overload
func(std::string("abc") + std::string("def"));
}
It seems like the const std::string&
overload could handle both cases: lvalues as usual, and rvalues as a const reference (since temporary expressions are sort of const by definition). Since the compiler knows when an expression is an lvalue or an rvalue, it could decide whether to elide the copy in the case of an rvalue.
Basically, why are move semantics considered special and not just a compiler optimization that could have been performed by pre-C++11 compilers?
Move functions do not elide temporary copies, exactly.
The same number of temporaries exists, it's just that instead of calling the copy constructor typically, the move constructor is called, which is allowed to cannibalize the original rather than make an independent copy. This may sometimes be vastly more efficient.
The C++ formal object model is not at all modified by move semantics. Objects still have a well-defined lifetime, starting at some particular address, and ending when they are destroyed there. They never "move" during their life time. When they are "moved from", what is really happening is the guts are scooped out of an object that is scheduled to die soon, and placed efficiently in a new object. It may look like they moved, but formally, they didn't really, as that would totally break C++.
Being moved from is not death. Move is required to leave objects in a "valid state" in which they are still alive, and the destructor will always be called later.
Eliding copies is a totally different thing, where in some chain of temporary objects, some of the intermediates are skipped. Compilers are not required to elide copies in C++11 and C++14, they are permitted to do this even when it may violate the "as-if" rule that usually guides optimization. That is even if the copy ctor may have side effects, the compiler at high optimization settings may still skip some of the temporaries.
By contrast, "guaranteed copy ellision" is a new C++17 feature, which means that the standard requires copy ellision to take place in certain cases.
Move semantics and copy ellision give two different approaches to enabling greater efficiency in these "chain of temporaries" scenarios. In move semantics, all the temporaries still exist, but instead of calling the copy constructor, we get to call a (hopefully) less expensive constructor, the move constructor. In copy ellision, we get to skip some of the objects all together.
Basically, why are move semantics considered special and not just a compiler optimization that could have been performed by pre-C++11 compilers?
Move semantics are not a "compiler optimization". They are a new part of the type system. Move semantics happens even when you compile with -O0
on gcc
and clang
-- it causes different functions to be called, because, the fact that an object is about to die is now "annotated" in the type of reference. It allows "application level optimizations" but this is different from what the optimizer does.
Maybe you can think of it as a safety-net. Sure, in an ideal world the optimizer would always eliminate every unnecessary copy. Sometimes, though, constructing a temporary is complex, involves dynamic allocations, and the compiler doesn't see through it all. In many such cases, you will be saved by move semantics, which might allow you to avoid making a dynamic allocation at all. That in turn may lead to generated code that is then easier for the optimizer to analyze.
The guaranteed copy ellision thing is sort of like, they found a way to formalize some of this "common sense" about temporaries, so that more code not only works the way you expect when it gets optimized, but is required to work the way you expect when it gets compiled, and not call a copy constructor when you think there shouldn't really be a copy. So you can e.g. return non-copyable, non-moveable types by value from a factory function. The compiler figures out that no copy happens much earlier in the process, before it even gets to the optimizer. This is really the next iteration of this series of improvements.