Search code examples
pythonnumpyindices

NumPy: Find sorted indices from a masked 2D array above and below a threshold


I have a 2D masked array of values that I need to sort from lowest to highest. For example:

import numpy as np

# Make a random masked array
>>> ar = np.ma.array(np.round(np.random.normal(50, 10, 20), 1),
                     mask=np.random.binomial(1, .2, 20)).reshape((4,5))
>>> print(ar)
[[-- 51.9 38.3 46.8 43.3]
 [52.3 65.0 51.2 46.5 --]
 [56.7 51.1 -- 38.6 33.5]
 [45.2 56.8 74.1 58.4 56.4]]

# Sort the array from lowest to highest, with a flattened index
>>> sorted_ind = ar.argsort(axis=None)
>>> print(sorted_ind)
[14  2 13  4 15  8  3 11  7  1  5 19 10 16 18  6 17  0 12  9]

But with the sorted indices, I need to divide them into two simple subsets: less than or equal to and greater than or equal to a given datum. Furthermore, I don't need the masked values, and they need to be removed. For example, with datum = 51.1, how do I filter down sorted_ind to the 10 indices above datum and 8 values below? (Note: there is one shared index due to the or equal to logic criteria. The 3 masked values can be removed from analysis). I need to preserve the flattened index position, as I use np.unravel_index(ind, ar.shape) later on.


Solution

  • to use where:

    import numpy as np
    np.random.seed(0)
    # Make a random masked array
    ar = np.ma.array(np.round(np.random.normal(50, 10, 20), 1),
                         mask=np.random.binomial(1, .2, 20)).reshape((4,5))
    # Sort the array from lowest to highest, with a flattened index
    sorted_ind = ar.argsort(axis=None)
    
    tmp = ar.flatten()[sorted_ind]
    print sorted_ind[np.ma.where(tmp<=51.0)]
    print sorted_ind[np.ma.where(tmp>=51.0)]
    

    but since tmp is sorted, you can use np.searchsorted():

    tmp = ar.flatten()[sorted_ind].compressed() # compressed() will delete all invalid data.
    idx = np.searchsorted(tmp, 51.0)
    print sorted_ind[:idx]
    print sorted_ind[idx:len(tmp)]