My hope is that this discussion might help anyone else having issues with Armadillo and Eigen3.
I've written a wrapper class, Mat, that wraps either an arma::Mat from the armadillo library or an Eigen::Matrix from the Eigen3 library. This is controlled with a flag at compile time.
Additionally, I've written a Tensor class which uses a Mat as storage. The primary feature of this class is the use of Voigt notation to condense higher order tensors to be properly stored in a matrix.
Finally, I've written a test that multiplies a 2nd order tensor (i.e. a matrix) and a 1st order tensor (i.e. a vector) multiple times and records the time it takes to complete the operators. I do this with my Mat class and with my Tensor class.
Because Tensor wraps Mat, I would expect its time to be larger. This is the case with Armadillo, close to 20% on average. However, when using Eigen, using Tensor is faster, which makes absolutely no sense to me.
Does anything stick out to anyone?
EDIT: Providing more details.
I've first wrapped arma::Mat into myOwn::armaMat and Eigen::Matrix into myOwn::eigenMat. Both of these simply wrap armadillo and Eigen's API into a common framework. Finally, based on a compiler flag, myOwn::Mat wraps an armaMat or an eigenMat. I'm not sure about any optimization flags we have turned on.
As described above, myOwn::Tensor uses myOwn::Mat as a storage. Because of the physical applications I'll be using the Tensor class for, it is templated to be 2D (i.e. 2-by-2 if it's 2nd order) or 3D (i.e. 3-by-3). (In contrast, Mat can be of any size).
The operator I'm using for timing purposes is: a 2-by-2 matrix (2nd order tensor) times a 2-by-1 matrix (1st order tensor). When using only Mat, I'm essentially using armadillo's or Eigen's expression templating.
When using my Tensor class, I'm overloading operator* as such:
template< typename T1, bool Sym >
moris::Mat< T1 >
operator*(
moris::Tensor< T1, 2, 2, true > const & aTensor1,
moris::Tensor< T1, 1, 2, Sym > const & aTensor2 )
{
moris::Mat< T1 > tVector(2, 1);
tVector(0) = aTensor1[0]*aTensor2[0] + aTensor1[2]*aTensor2[1];
tVector(1) = aTensor1[2]*aTensor2[0] + aTensor1[1]*aTensor2[1];
return tVector;
}
The [] operator on Tensor accesses data form the underlying storage Mat (via Voigt convention).
"It's complicated."
We offer bindings for both Armadillo and Eigen to R via the add-on packages RcppArmadillo and RcppEigen so the comparison and horse-race question comes up a lot.
And I don't think there is a clear answer. To make matters "worse", Armadillo generally refers to whichever LAPACK/BLAS you have installed and you could hence be using multicore parallelism, whereas Eigen tends to prefer its own routines. When preparing my Rcpp book I did some timings and found some counter-intuitive results.
At the end of the day you may just need to profile your problem.