In Eigen library to implement A^T*A
one can write:
X.template triangularView<Lower>().setZero();
X.template selfadjointView<Lower>().rankUpdate(A.transpose());
Is there any better (more efficient) way to write it, if A is a lower triangular matrix? I tried following, but it gives compilation error:
X.template selfadjointView<Lower>().rankUpdate(A.template triangularView<Lower>().transpose());
It gives error:
error: no matching member function for call to 'rankUpdate'
Eigen typically includes only those matrix routines that have a corresponding function in BLAS. There is no BLAS routine for this particular case. One additional wrinkle is that Eigen's implementation of rankUpdate and triangular matrix multiplication are not parallelized unless you define -DEIGEN_USE_BLAS
and link against a BLAS library that does this, e.g. OpenBLAS.
I did some testing with various implementations.
void trtr(Eigen::MatrixXf& out, const Eigen::MatrixXf& in)
{
out = Eigen::MatrixXf::Zero(in.rows(), in.cols());
out.selfadjointView<Eigen::Lower>().rankUpdate(in.transpose());
out.triangularView<Eigen::StrictlyUpper>() = out.transpose();
}
Only parallelized with OpenBLAS. Then it is pretty fast, otherwise somewhat slow.
void trtr(Eigen::MatrixXf& out, const Eigen::MatrixXf& in)
{ out.noalias() = in.transpose() * in.triangularView<Eigen::Lower>(); }
Only parallelized with OpenBLAS. Even then it is somewhat slow.
void trtr(Eigen::MatrixXf& out, const Eigen::MatrixXf& in)
{ out.noalias() = in.transpose() * in; }
Always parallelized. Consistently fast.
I tried implementing my own version that splits the matrix in blocks to remove redundant operations. This is somewhat faster for large matrices, especially when using OpenBLAS.
It is not well tuned for small matrices.
void trtr_recursive
(Eigen::Ref<Eigen::MatrixXf> out,
const Eigen::Ref<const Eigen::MatrixXf>& in) noexcept
{
Eigen::Index size = out.rows();
if(size <= 16) {
/*
* Two notes:
* 1. The cutoff point is a possible tuning parameter
* 2. This is the only place where we assume that the upper triangle
* is initialized to zero. We can remove this assumption by making
* a copy of the input into a fixed size matrix. Use Eigen::Matrix
* with the MaxRows and MaxCols template argument for this
*/
out.selfadjointView<Eigen::Lower>().rankUpdate(in.transpose());
return;
}
Eigen::Index full = (size + 1) >> 1;
Eigen::Index part = size >> 1;
const auto& fullin = in.bottomLeftCorner(full, full);
const auto& bottomright = in.bottomRightCorner(part, part);
const auto& bottomleft = in.bottomLeftCorner(part, full);
out.topLeftCorner(full, full).selfadjointView<Eigen::Lower>()
.rankUpdate(fullin.transpose());
out.bottomLeftCorner(part, full).noalias() +=
bottomright.transpose().triangularView<Eigen::Upper>() *
bottomleft;
trtr_recursive(out.topLeftCorner(part, part),
in.topLeftCorner(part, part));
trtr_recursive(out.bottomRightCorner(part, part),
bottomright);
}
void trtr(Eigen::MatrixXf& out, const Eigen::MatrixXf& in)
{
out = Eigen::MatrixXf::Zero(in.rows(), in.cols());
trtr4_recursive(out, in);
out.triangularView<Eigen::StrictlyUpper>() = out.transpose();
}