I'm using Eigen extensively in a scientific application I've been developing for some time. Since I'm implementing a numerical method, a number below a certain threshold (e.g. 1e-15
) is not a point of interest, and slowing down calculations, and increasing error rate.
Hence, I want to round off numbers below that threshold to 0
. I can do it with a for
loop, but hammering multiple relatively big matrices (2M cells and up per matrix) with a for
-if
loop is expensive and slowing me down since I need to do it multiple times.
Is there a more efficient way to do this with Eigen
library?
In other words, I'm trying to eliminate numbers below a certain threshold in my calculation pipeline.
The shortest way to write what you want, is
void foo(Eigen::VectorXf& inout, float threshold)
{
inout = (threshold < inout.array().abs()).select(inout, 0.0f);
}
However, neither comparisons nor the select
method get vectorized by Eigen (as of now).
If speed is essential, you need to either write some manual SIMD code, or write a custom functor which supports a packet
method (this uses internal functionality of Eigen, so it is not guaranteed to be stable!):
template<typename Scalar> struct threshold_op {
Scalar threshold;
threshold_op(const Scalar& value) : threshold(value) {}
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const{
return threshold < std::abs(a) ? a : Scalar(0); }
template<typename Packet>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const {
using namespace Eigen::internal;
return pand(pcmp_lt(pset1<Packet>(threshold),pabs(a)), a);
}
};
namespace Eigen { namespace internal {
template<typename Scalar>
struct functor_traits<threshold_op<Scalar> >
{ enum {
Cost = 3*NumTraits<Scalar>::AddCost,
PacketAccess = packet_traits<Scalar>::HasAbs };
};
}}
This can then be passed to unaryExpr
:
inout = inout.unaryExpr(threshold_op<float>(threshold));
Godbolt-Demo (should work with SSE/AVX/AVX512/NEON/...): https://godbolt.org/z/bslATI
It might actually be that the only reason for your slowdown are subnormal numbers. In that case, a simple
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
should do the trick (cf: Why does changing 0.1f to 0 slow down performance by 10x?)