pythonnumpy

How to find the threshold that will yield the desired number of array elements


Given an array of numbers and a target count, I want to find the threshold such that the number of element that are above it will be equal the target (or as close as possible).

For example.

arr = np.random.rand(100)
target = 80
for i in range(100):
    t = i * 0.01
    if (arr > t).sum() < target: break
print(t)

However this is not efficient and it is not very precise, and perhaps someone has already solved this problem.

EDIT:

In the end I found scipy.optimize.bisect (link) which works perfectly.


Solution

  • I am using scipy.optimize.bisect

    import scipy.optimize
    import numpy as np
    
    def compute_cutoff(arr:np.ndarray, volume:float) -> float:
        """
        Compute the cutoff to attain desired volume.  
        Returns the cutoff, such that (arr > cutoff).sum() == volume
        or as close as possible.
        :param np.ndarray arr: an array with values (0 .. 1)
        :param float volume: desired volume in number of voxels
        """
        tolerance = max(1,volume*.01)
        gross_diff = lambda y: y - volume if abs(y - volume) > tolerance else 0
        err_fn = lambda x : gross_diff((arr > x).sum())
        # probably this [-2,2] could be tighter [0,1], but just to be safe.
        return scipy.optimize.bisect(err_fn, -2, 2, maxiter=50, disp=False)
    
    
    
    arr = np.random.rand(100)
    t = compute_cutoff(arr, 80)
    print(t)
    

    This prints a value close to 0.2