Pretty much the title. I'm trying to think up a good way to formulate this without creating any temporaries so I can actually make use of Eigen's speediness. The best I could come up with is this formulation:
Essentially, I end up with a 3nx3n
matrix being multiplied by a 3nxn
matrix, and I end up with a 3nxn
matrix of the transformed coordinates. Is this the most efficient approach? I don't love that there is the extra overhead (both performance and memory-wise) of constructing these full matrices.
Is there a way to maybe use an Eigen::Map
on a contiguous set of data, e.g. [a0, b0, c0, d0, e0, f0, g0, h0, i0, a1, b1, c1, d1,...]
, and have it behave like the sparse matrix below?
Additionally, if my above approach is the most sensible way, is there an efficient way to collapse the resulting matrix into a non-sparse 3xN
matrix of the results?
This will not work the way you hope. The main issue is that for a dense matrix format, you would vastly increase the number of operations -- most of them being zero -- while for sparse matrix products, Eigen doesn't have a block-diagonal sparse format (though we can build one if we want.
Instead, Eigen can optimize well for small fixed-size matrices or blocks of larger matrices. It is much better to use something like Matrix3f
or Matrix3Xf
than a MatrixXf
that happens to be size 3 in one or both dimensions at runtime.
Therefore a good choice is to do something like this:
int rotation_count = ...;
Eigen::Matrix3Xf rotations = Eigen::Matrix3Xf::Random(3, rotation_count * 3);
Eigen::Matrix3Xf positions = Eigen::Matrix3Xf::Random(3, rotation_count);
Eigen::Matrix3Xf out(3, rotation_count);
# pragma omp parallel for
for(Eigen::Index i = 0; i < rotation_count; ++i)
out.col(i).noalias() =
rotations.middleCols<3>(i * 3) * positions.col(i);
Note two points in particular:
middleCols
refers to a consecutive 3x3 segment instead of having a large stride from one column to the next.middleCols
to create a block that carries the compile-time information that it is 3x3We can further improve this by padding to 4x4 matrices. This wastes a bit of memory but it allows better vectorization since 4 floats fill one SSE register or 4 doubles fill one AVX register.
int rotation_count = ...;
// With real data, keep the 4-th column and every 4-th row zero
Eigen::Matrix4Xf rotations = Eigen::Matrix4Xf::Random(4, rotation_count * 4);
Eigen::Matrix4Xf positions = Eigen::Matrix4Xf::Random(4, rotation_count);
Eigen::Matrix4Xf out(4, rotation_count);
# pragma omp parallel for
for(Eigen::Index i = 0; i < rotation_count; ++i)
out.col(i).noalias() =
rotations.middleCols<4>(i * 4) * positions.col(i);
On my system, the 3x3 implementation does 100,000 repetitions with 10,000 rotation matrices in 3.7 seconds and the 4x4 implementation takes 2,6 seconds. Make sure to compile with -DNDEBUG
for production to remove the cost of assertions.
Alternatively you could try and get Eigen's unsupported tensor module to work for this but I'm not familiar with its use.