Rounding numbers below a certain threshold to zero in Eigen

I'm using Eigen extensively in a scientific application I've been developing for some time. Since I'm implementing a numerical method, a number below a certain threshold (e.g. 1e-15) is not a point of interest, and slowing down calculations, and increasing error rate.

Hence, I want to round off numbers below that threshold to 0. I can do it with a for loop, but hammering multiple relatively big matrices (2M cells and up per matrix) with a for-if loop is expensive and slowing me down since I need to do it multiple times.

Is there a more efficient way to do this with Eigen library?

In other words, I'm trying to eliminate numbers below a certain threshold in my calculation pipeline.

Solution

The shortest way to write what you want, is

void foo(Eigen::VectorXf& inout, float threshold)
{
    inout = (threshold < inout.array().abs()).select(inout, 0.0f);
}

However, neither comparisons nor the select method get vectorized by Eigen (as of now).

If speed is essential, you need to either write some manual SIMD code, or write a custom functor which supports a packet method (this uses internal functionality of Eigen, so it is not guaranteed to be stable!):

template<typename Scalar> struct threshold_op {
  Scalar threshold;
  threshold_op(const Scalar& value) : threshold(value) {}
  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const{
    return threshold < std::abs(a) ? a : Scalar(0); }
  template<typename Packet>
  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const {
    using namespace Eigen::internal;
    return pand(pcmp_lt(pset1<Packet>(threshold),pabs(a)), a);
  }
};
namespace Eigen { namespace internal {

template<typename Scalar>
struct functor_traits<threshold_op<Scalar> >
{ enum {
    Cost = 3*NumTraits<Scalar>::AddCost,
    PacketAccess = packet_traits<Scalar>::HasAbs };
};

}}

This can then be passed to unaryExpr:

inout = inout.unaryExpr(threshold_op<float>(threshold));

Godbolt-Demo (should work with SSE/AVX/AVX512/NEON/...): https://godbolt.org/z/bslATI

It might actually be that the only reason for your slowdown are subnormal numbers. In that case, a simple

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

should do the trick (cf: Why does changing 0.1f to 0 slow down performance by 10x?)