Search code examples
c++c++11optimizationrvalue-reference

Is there any reason to overload operators with rvalue reference?


There is templated vector class(it's about math, not container). I need to overload common math operations. Is there any sense to overload like this:

template <typename T, size_t D>
Vector<T, D> operator+(const Vector<T, D>& left, const Vector<T, D>& right)
{
    std::cout << "operator+(&, &)" << std::endl;
    Vector<T, D> result;

    for (size_t i = 0; i < D; ++i)
        result.data[i] = left.data[i] + right.data[i];

    return result;
}

template <typename T, size_t D>
Vector<T, D>&& operator+(const Vector<T, D>& left, Vector<T, D>&& right)
{
    std::cout << "operator+(&, &&)" << std::endl;
    for (size_t i = 0; i < D; ++i)
        right.data[i] += left.data[i];

    return std::move(right);
}

template <typename T, size_t D>
Vector<T, D>&& operator+(Vector<T, D>&& left, const Vector<T, D>& right)
{
    std::cout << "operator+(&&, &)" << std::endl;
    for (size_t i = 0; i < D; ++i)
        left.data[i] += right.data[i];

    return std::move(left);
}

This works pretty fine with this test code:

auto v1 = math::Vector<int, 10>(1);
auto v2 = math::Vector<int, 10>(7);
auto v3 = v1 + v2;

printVector(v3);

auto v4 = v3 + math::Vector<int, 10>(2);

printVector(v4);

auto v5 = math::Vector<int, 10>(5) + v4;

printVector(v5);

//      ambiguous overload
//      auto v6 = math::Vector<int, 10>(100) + math::Vector<int, 10>(99);

and prints this:

operator+(&, &)
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 
operator+(&, &&)
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
operator+(&&, &)
15, 15, 15, 15, 15, 15, 15, 15, 15, 15,

There is problem with two rvalue references, but I think it doesn't matter.

Why I want to do that? Because of performance reason, in theory it would work little bit faster without creating new object, but will it? Maybe compilers optimize simple code with operator +(const Vector& left, const Vector& right) and there is no any reason for overload rvalue?


Solution

  • It depends on your implementation of Vector:

    • If the class is faster to move than to copy, providing the additional overloads for moves could result in performance improvements.
    • Otherwise, it should not be faster to provide the overloads.

    In a comment, you mentioned that Vector looks like this:

    template <typename T, size_t D>
    class Vector
    {
       T data[D];
       // ...
    };
    

    From your code, I also assume that T is a simple arithmetic type (e.g., float, int) where copying is as fast as moving it. In that case, you cannot implement a move operation for Vector<float, D> which will be faster than a copy operation.

    To make move operations faster than copying, you could change the representation of your Vector class. Instead of storing a C-array, you could store a pointer to the data, which allows a much more efficient move operation if the size D is large.

    As an analogy, your current Vector implementation is like an std::array<T, D> (which holds internally a c-array and needs to be copied), but you could switch to an std::vector<T> (which holds pointers to the heap and is easy to move). The larger the value D becomes, the more attractive should it be to switch from std::array to std::vector.

    Let us look closer at the differences when providing overloads for move operations.

    Improvement: in-place updates

    The advantage of your overloads is that you can use in-place updates to avoid having to create a copy for the result as you have to do in your operator+(&,&) implementation:

    template <typename T, size_t D>
    Vector<T, D> operator+(const Vector<T, D>& left, const Vector<T, D>& right)
    {
        std::cout << "operator+(&, &)" << std::endl;
        Vector<T, D> result;
    
        for (size_t i = 0; i < D; ++i)
            result.data[i] = left.data[i] + right.data[i];
    
        return result;
    }
    

    In your overloaded version, you can update in-place:

    template <typename T, size_t D>
    Vector<T, D>&& operator+(const Vector<T, D>& left, Vector<T, D>&& right)
    {
        std::cout << "operator+(&, &&)" << std::endl;
        for (size_t i = 0; i < D; ++i)
            right.data[i] += left.data[i];
    
        return std::move(right);
    }
    

    However, moving the result will result in a copy when you use your current implementation of Vector, whereas in the non-overloaded version, the compiler could get rid of it using return-value optimization. If you use a std::vector like representation, moving is fast, so the in-place update version should be faster than the original version (operator+(&,&)).

    Can a compiler do the in-place update optimization automatically?

    Highly unlikely that a compiler will be able to do it without help.

    In the non-overloaded version, the compiler sees two arrays which are constant references. It will most likely be able to perform return-value optimization, but knowing that it can reuse one of the existing objects, requires lots of extra knowledge that the compiler does not have at that point.

    Summary

    Providing overloads for rvalues is reasonable from a pure performance perspective if Vector is faster to move than to copy. If moving is not faster, then there is no gain in providing the overloads.