Search code examples
pythonarraysperformancenumpyscientific-computing

Numpy: compare each element of array with all other elements (± constant)


I have a 1D Numpy array A of length N. For each element x in the array, I want to know what proportion of all elements in the array are within the range [x-eps; x+eps], where eps is a constant. N is in the order of 15,000.

At present I do it as follows (minimal example):

import numpy as np

N = 15000
eps = 0.01
A = np.random.rand(N, 1)
prop = np.array([np.mean((A >= x - eps) & (A <= x + eps)) for x in A])

.. which takes around 1 sec on my computer.

My question: is there a more efficient way of doing this?

Edit: I think @jdehesa suggestion in the comments would work as follows:

prop = np.isclose(A, A.T, atol=eps, rtol=0).mean(axis=1)

It's a nice concise solution, but without a speed advantage (on my computer).


Solution

  • That's a good setup to leverage np.searchsorted -

    sidx = A.argsort()
    ridx = np.searchsorted(A, A+eps, 'right', sorter=sidx)
    lidx = np.searchsorted(A, A-eps, 'left', sorter=sidx)
    out = ridx - lidx 
    

    Timings -

    In [71]: N = 15000
        ...: eps = 0.01
        ...: A = np.random.rand(N)
    
    In [72]: %timeit np.array([np.sum((A >= x - eps) & (A <= x + eps)) for x in A])
    560 ms ± 5.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    In [73]: %%timeit
        ...: sidx = A.argsort()
        ...: ridx = np.searchsorted(A, A+eps, 'right', sorter=sidx)
        ...: lidx = np.searchsorted(A, A-eps, 'left', sorter=sidx)
        ...: out = ridx - lidx
    5.35 ms ± 47.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

    Further improvement with pre-sorting :

    In [81]: %%timeit
        ...: sidx = A.argsort()
        ...: b = A[sidx]
        ...: ridx = np.searchsorted(b, A+eps, 'right')
        ...: lidx = np.searchsorted(b, A-eps, 'left')
        ...: out = ridx - lidx
    3.93 ms ± 19.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

    As stated in the comments, for the mean equivalent version, simply divide final array output by N.