I have an ndarray. I need to mask any number less than 50 until the first number encountered is greater than 50. This should be done both at the beginning and at the end of a row. Right, when the first number encountered is >50, the masking should stop.
One row looks like:
[ 0 1 1 0 57 121 120 157 77 14 0 3 0 0 0 0 67 100
98 97 101 129 139 105 97 105 181 126 10 0 0]
I want something like:
[-- -- -- -- 57 121 120 157 77 14 0 3 0 0 0 0 67 100 98 97
101 129 139 105 97 97 105 181 126 -- -- --]
The masking should stop right before 57 in the second line, and at 126 in the 4th last line.
I have tried ma.masked_where, but it masks the 0s in between as well, which I don't want.
So, if there a way to do this, or can you help me so that I can specify a range of indices, eg: [0:40]
only which should be masked.
I do not want to change the dimension of the array after it is masked. Also, the presence of --
wouldn't make a difference to my purpose.
You can use either Boolean indexing or manual iteration. The former is more efficient for small arrays; the latter for large arrays with a small number of out-of-scope values either side.
x = np.array([0, 0, 0, 2, 3, 51, 34, 1, 23, 32, 32, 52, 0, 0, 0])
start = (x > 50).argmax()
end = len(x) - (x[::-1] > 50).argmax()
print(x[start: end])
[51 34 1 23 32 32 52]
Using next
with a generator expression and enumerate
:
start = next(idx for idx, val in enumerate(x) if val > 50)
end = len(x) - next(idx for idx, val in enumerate(reversed(x)) if val > 50)
print(x[start: end])
[51 34 1 23 32 32 12]
If you wish to replace out-of-scope values with np.nan
, you can assign accordingly, remembering to convert to float
first, as NaN
values are float
:
x = x.astype(float)
x[:start] = np.nan
x[end:] = np.nan
print(x)
array([nan, nan, nan, nan, nan, 51., 34., 1., 23., 32., 32., 52., nan, nan, nan])