matrix-multiplication timing armadillo eigen3

Armadillo vs Eigen3 Timing Difference

My hope is that this discussion might help anyone else having issues with Armadillo and Eigen3.

I've written a wrapper class, Mat, that wraps either an arma::Mat from the armadillo library or an Eigen::Matrix from the Eigen3 library. This is controlled with a flag at compile time.

Additionally, I've written a Tensor class which uses a Mat as storage. The primary feature of this class is the use of Voigt notation to condense higher order tensors to be properly stored in a matrix.

Finally, I've written a test that multiplies a 2nd order tensor (i.e. a matrix) and a 1st order tensor (i.e. a vector) multiple times and records the time it takes to complete the operators. I do this with my Mat class and with my Tensor class.

Because Tensor wraps Mat, I would expect its time to be larger. This is the case with Armadillo, close to 20% on average. However, when using Eigen, using Tensor is faster, which makes absolutely no sense to me.

Does anything stick out to anyone?

EDIT: Providing more details.

I've first wrapped arma::Mat into myOwn::armaMat and Eigen::Matrix into myOwn::eigenMat. Both of these simply wrap armadillo and Eigen's API into a common framework. Finally, based on a compiler flag, myOwn::Mat wraps an armaMat or an eigenMat. I'm not sure about any optimization flags we have turned on.

As described above, myOwn::Tensor uses myOwn::Mat as a storage. Because of the physical applications I'll be using the Tensor class for, it is templated to be 2D (i.e. 2-by-2 if it's 2nd order) or 3D (i.e. 3-by-3). (In contrast, Mat can be of any size).

The operator I'm using for timing purposes is: a 2-by-2 matrix (2nd order tensor) times a 2-by-1 matrix (1st order tensor). When using only Mat, I'm essentially using armadillo's or Eigen's expression templating.

When using my Tensor class, I'm overloading operator* as such:

template< typename T1, bool Sym >
moris::Mat< T1 >
operator*(
        moris::Tensor< T1, 2, 2, true > const & aTensor1,
        moris::Tensor< T1, 1, 2, Sym >  const & aTensor2 )
{

    moris::Mat< T1 > tVector(2, 1);

    tVector(0) = aTensor1[0]*aTensor2[0] + aTensor1[2]*aTensor2[1];
    tVector(1) = aTensor1[2]*aTensor2[0] + aTensor1[1]*aTensor2[1];

    return tVector;
}

The [] operator on Tensor accesses data form the underlying storage Mat (via Voigt convention).

Solution

"It's complicated."

We offer bindings for both Armadillo and Eigen to R via the add-on packages RcppArmadillo and RcppEigen so the comparison and horse-race question comes up a lot.

And I don't think there is a clear answer. To make matters "worse", Armadillo generally refers to whichever LAPACK/BLAS you have installed and you could hence be using multicore parallelism, whereas Eigen tends to prefer its own routines. When preparing my Rcpp book I did some timings and found some counter-intuitive results.

At the end of the day you may just need to profile your problem.