Search code examples
c++matrixeigen

Rounding numbers below a certain threshold to zero in Eigen


I'm using Eigen extensively in a scientific application I've been developing for some time. Since I'm implementing a numerical method, a number below a certain threshold (e.g. 1e-15) is not a point of interest, and slowing down calculations, and increasing error rate.

Hence, I want to round off numbers below that threshold to 0. I can do it with a for loop, but hammering multiple relatively big matrices (2M cells and up per matrix) with a for-if loop is expensive and slowing me down since I need to do it multiple times.

Is there a more efficient way to do this with Eigen library?

In other words, I'm trying to eliminate numbers below a certain threshold in my calculation pipeline.


Solution

  • The shortest way to write what you want, is

    void foo(Eigen::VectorXf& inout, float threshold)
    {
        inout = (threshold < inout.array().abs()).select(inout, 0.0f);
    }
    

    However, neither comparisons nor the select method get vectorized by Eigen (as of now).

    If speed is essential, you need to either write some manual SIMD code, or write a custom functor which supports a packet method (this uses internal functionality of Eigen, so it is not guaranteed to be stable!):

    template<typename Scalar> struct threshold_op {
      Scalar threshold;
      threshold_op(const Scalar& value) : threshold(value) {}
      EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const{
        return threshold < std::abs(a) ? a : Scalar(0); }
      template<typename Packet>
      EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const {
        using namespace Eigen::internal;
        return pand(pcmp_lt(pset1<Packet>(threshold),pabs(a)), a);
      }
    };
    namespace Eigen { namespace internal {
    
    template<typename Scalar>
    struct functor_traits<threshold_op<Scalar> >
    { enum {
        Cost = 3*NumTraits<Scalar>::AddCost,
        PacketAccess = packet_traits<Scalar>::HasAbs };
    };
    
    }}
    

    This can then be passed to unaryExpr:

    inout = inout.unaryExpr(threshold_op<float>(threshold));
    

    Godbolt-Demo (should work with SSE/AVX/AVX512/NEON/...): https://godbolt.org/z/bslATI


    It might actually be that the only reason for your slowdown are subnormal numbers. In that case, a simple

    _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
    

    should do the trick (cf: Why does changing 0.1f to 0 slow down performance by 10x?)