Search code examples
c++c++17c++20copy-elisionreturn-value-optimization

Copy elision when function returns a local variable and temporary via different paths


I have a class Foo

class Foo {
    public:
    Foo(std::string s) : s_(std::move(s)) {
        std::cout << "Constructor " << s_ << "\n";
    }
    ~Foo() {
        std::cout << "Destructor " << s_ << "\n";
    }
    Foo(const Foo& other) {
        s_ = other.s_;
        std::cout << "Copy Constructor\n";
    }
    Foo& operator=(const Foo& other) {
        std::cout << "Copy assignment\n";
        s_ = other.s_;
        return *this;
    }
    Foo(Foo&& other) {
        s_ = std::move(other.s_);
        std::cout << "Move Constructor\n";
    }
    Foo& operator=(Foo&& other) {
        std::cout << "Move assignment\n";
        s_ = std::move(other.s_);
        return *this;
    }
    void Set(std::string s) {
        s_ = std::move(s);
    }
    const std::string& Get() const {
        return s_;
    }
    private:
    std::string s_;
};

Normal case where copy elision takes place as expected:

Foo DoSomething(std::string s) {
      Foo foo{std::move(s)};    
      if (foo.Get() == "2") {
        return foo;
      }
      if (foo.Get() == "4") {
          return foo;
      }
      foo.Set("1");
      return foo;
    }

int main()
{
  Foo ret(DoSomething("5"));
}

I am trying to make sense of these cases:

Case 1

Foo DoSomething(std::string s) {
  Foo foo{std::move(s)};    
  if (foo.Get() == "2") {
    return foo;
  }
  if (foo.Get() == "4") {
      return Foo("4");
  }
  foo.Set("1");
  return foo;
}

int main()
{
  Foo ret(DoSomething("5"));
}

calls the move constructor i.e. copy elision does not take place

Case 2

Foo DoSomething(std::string s) {
  if (s == "2") {
    return Foo("2");
  }
  if (s == "4") {
      return Foo("4");
  }
  Foo foo{std::move(s)};    
  foo.Set("1");
  return foo;
}

int main()
{
  Foo ret(DoSomething("5"));
}

Copy elision works again.

Case 3

Foo DoSomething(std::string s) {
  Foo foo1{"2"};      
  Foo foo2{"4"};  
  if (s == "2") {
    return foo1;
  }
  if (s == "4") {
    return foo2;
  }
  Foo foo{std::move(s)};
  return foo;    
}

int main()
{
  Foo ret(DoSomething("4"));
}

Copy elision does not take place.

Case 4

Foo DoSomething(std::string s) {
  Foo foo1{"2"};      
  Foo foo2{"4"};  
  if (s == "2") {
    return foo1;
  }
  if (s == "4") {
    return foo2;
  }
  Foo foo{std::move(s)};
  return foo;    
}

int main()
{
  Foo ret(DoSomething("5"));
}

Copy elision takes place again.

I ran these example using a C++20 compiler. I know C++17's only guarantees RVO and not NRVO but I am very confused now as to when I can expect copy elision to take place in practice.


Solution

  • but I am very confused now as to when I can expect copy elision to take place in practice.

    You shouldn't make any assumption. The situations in which NRVO is applied vary between compilers, sometimes in unexpected ways simply due to some seemingly irrelevant implementation details (e.g. Clang sometimes behaves differently if the function is declared with auto return type even in simple cases).

    Some thought about the behavior you see:

    The problem in general is that to implement NRVO we must make the decision to place the object that will be returned at the single memory slot reserved for the result object as soon as it is initialized. If there are multiple potential objects, we must manage the usage of this slot in some a way. We can't have multiple objects placed there which have overlapping lifetime in the function.

    In case 1 it would in principle not be too difficult to implement NRVO with one exception. First, the compiler has to recognize that foo can be constructed at the result object's address. If one of the other branches returning a temporary is chosen, then foo can be replaced by the new temporary object, because foo isn't used anymore in these branches after that replacement. However, the issue here is that Foo has a non-trivial destructor that needs to run after this replacement on the previous object. I suspect that compilers aren't going to analyse the destructor to figure out whether the order can be reversed without affecting observable behavior.

    In case 2 it is easy, because when foo is constructed we know for sure that it will also be the returned object, so it can be constructed at the result object's address without any problems. The only other time that the result object's storage is used is in the other return statements, which can't occur in the same branch.

    In case 3 (and 4) it is a bit tricky: We can't construct both foo1 or foo2 at the result object's address, because both have overlapping lifetime and use in the function. But we could see that if we reach the end of the function then foo is again guaranteed to be the returned object, so we can construct foo into the result object's storage immediately if neither foo1 nor foo2 are ever placed there. Placing either foo1 or foo2 there would also cause the same issue as before again: When the branch returning foo is taken, the result object's storage would need to be reused, but the destructor of foo1/foo2 must run after the initialization of the result object. That's probably why you see the case 3/4 behavior. (Again, a compiler could make any other decision as well.)

    So I guess as a rough guideline, you should make sure that, when the Foo object is being initialized, all possible paths of execution (aside from exit by exception) will also return this object. Then the compiler can safely use the result object's slot for this object.