Search code examples
pythonarraysnumpymaskingmasked-array

Masking only a certain range of indices in numpy


I have an ndarray. I need to mask any number less than 50 until the first number encountered is greater than 50. This should be done both at the beginning and at the end of a row. Right, when the first number encountered is >50, the masking should stop.

One row looks like:

    [ 0   1   1   0  57  121  120  157  77  14   0   3   0   0   0   0  67 100  
    98  97 101 129 139 105  97 105 181 126  10   0   0]

I want something like:

    [-- -- -- -- 57 121 120 157 77 14  0  3  0  0  0  0 67 100 98 97
 101 129 139 105 97 97 105 181 126 -- -- --]

The masking should stop right before 57 in the second line, and at 126 in the 4th last line.

I have tried ma.masked_where, but it masks the 0s in between as well, which I don't want.

So, if there a way to do this, or can you help me so that I can specify a range of indices, eg: [0:40] only which should be masked.

I do not want to change the dimension of the array after it is masked. Also, the presence of -- wouldn't make a difference to my purpose.


Solution

  • You can use either Boolean indexing or manual iteration. The former is more efficient for small arrays; the latter for large arrays with a small number of out-of-scope values either side.

    Boolean indexing

    x = np.array([0, 0, 0, 2, 3, 51, 34, 1, 23, 32, 32, 52, 0, 0, 0])
    
    start = (x > 50).argmax()
    end = len(x) - (x[::-1] > 50).argmax()
    
    print(x[start: end])
    
    [51 34  1 23 32 32 52]
    

    Manual iteration

    Using next with a generator expression and enumerate:

    start = next(idx for idx, val in enumerate(x) if val > 50)
    end = len(x) - next(idx for idx, val in enumerate(reversed(x)) if val > 50)
    
    print(x[start: end])
    
    [51 34  1 23 32 32 12]
    

    Masking

    If you wish to replace out-of-scope values with np.nan, you can assign accordingly, remembering to convert to float first, as NaN values are float:

    x = x.astype(float)
    x[:start] = np.nan
    x[end:] = np.nan
    
    print(x)
    
    array([nan, nan, nan, nan, nan, 51., 34.,  1., 23., 32., 32., 52., nan, nan, nan])