Search code examples
pythonnumpynumpy-ndarray

best way to check if a numpy array is all non negative


This works, but not algorithmically optimal since I dont need the min value to be stored while the function is parsing the array:

def is_non_negative(m):
    return np.min(m) >= 0

Edit: Depending on the data an optimal function could indeed save a lot because it will terminate at the first encounter of a negative value. If only one negative value is expected, time will be cut by a factor of two in average. However building the optimal algorithm outside numpy library will be at a huge cost (Python code vs C++ code).


Solution

  • One pure-Numpy solution is to use a chunk based strategy:

    def is_non_negative(m):
        chunkSize = max(min(65536, m.size/8), 4096) # Auto-tunning
        for i in range(0, m.size, chunkSize):
            if np.min(m[i:i+chunkSize]) < 0:
                return False
        return True
    

    This solution is only efficient if the arrays are big, and chunks are big enough for the Numpy call overhead to be small and small enough to split the global array in many parts (so to benefit from the early cut). The chunk size needs to be pretty big so to balance the relatively big overhead of np.min on small arrays.


    Here is a Numba solution:

    import numba as nb
    
    # Eagerly compiled funciton for some mainstream data-types.
    @nb.njit(['(float32[::1],)', '(float64[::1],)', '(int_[::1],)'])
    def is_non_negative_nb(m):
        for e in m:
            if e < 0:
                return False
        return True
    

    It turns out this is faster than using np.min on my machine although the code is not well auto-vectorized (ie. do not use SIMD instruction) by LLVM-Lite (the JIT of Numba).

    For an even faster code, you need to use a C/C++ code and use a chunk-based SIMD-friendly code, and possibly use SIMD intrinsics if the compiler does not generate an efficient code which is unfortunately rather frequent in this case.