Search code examples
c++eigen

Eigen copy constructor vs. operator= performance


At my job, I use the Eigen math library. I have encountered a behavior where using an Eigen Matrix copy constructor in an initializer list for my own classes is significantly slower than using operator= in the constructor body.

In these examples, "Matrix" is a statically-sized dense matrix.

class Slow {
    public:
        Slow(const Matrix &m) : my_matrix{m} {}
    private:
        Matrix my_matrix;
}

class Fast {
    public:
        Fast(const Matrix &m) : my_matrix{} {
            my_matrix = m;
        }
    private:
        Matrix my_matrix;
}

My program frequently invokes the copy constructor of my class, and the difference in performance between the two options above is quite noticeable. I verified that the generated assembly is in fact different.

I understand that the copy-constructor and operator= are not the same, but I am having trouble wading through the Eigen source code to figure out why one is faster than the other. Can anyone with some Eigen expertise weigh in on what happens under the hood that causes operator= to be so much faster? Insight and/or links to recommended reading are equally welcome.


Solution

  • In the "fast" version, the copy is manually handled by Eigen with inlining, explicit loop unrolling, and explicit vectorization. In the "slow" case, the copy-ctor boils down to something like:

    template<typename T,int Size>
    struct storage {
      T data[Size];
      storage(storage &other)
        : data(other.data)
      {}
    };
    

    that we assumed to be properly optimized by the compiler. Unfortunately, if Size is a bit too large, both clang and gcc implement this copy as a call to memcpy loosing the compile-time Size information. On the other hand, letting the compiler handling this copy enables higher level optimizations, like temporary removal in some cases.