Context
Since masking with the numpy.ma
-module is significantly slower than direct boolean masking, I have to use the latter for my argmin
/argmax
-calculations.
A little comparison:
import numpy as np
# Masked Array
arr1 = np.ma.masked_array([12,4124,124,15,15], mask=[0,1,1,0,1])
# Boolean masking
arr2 = np.array([12,4124,124,15,15])
mask = np.array([0,1,1,0,1], dtype=np.bool)
%timeit arr1.argmin()
# 16.1 µs ± 4.88 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit arr2[mask].argmin()
# 946 ns ± 55.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Anyhow, using argmin
/argmax
returns the index of the first occurrence within the array. In case of boolean masking this means the index within arr2[mask]
and not arr2
. And there is my problem: I need the index within the unmasked array while calculating it on the masked array.
Question
How can I get the argmin
/argmax
-index of the unmasked arr2
, even when I apply it to the boolean masked version arr2[mask]
?
Here's one based mostly on masking
, specifically - mask-the-mask
and should be memory-efficient and hopefully good on performance too, especially when dealing with large arrays -
def reset_first_n_True(mask, n):
# Resets (fills with False) first n True places in mask
# Count of True in original mask array
c = np.count_nonzero(mask)
# Setup second mask that is to be assigned into original mask on its
# own True positions with the idea of setting first argmin_in_masked_ar
# True values to False
second_mask = np.ones(c, dtype=bool)
second_mask[:n] = False
mask[mask] = second_mask
return
# Use reduction function on masked data array
idx = np.argmin(random_array[random_mask])
reset_first_n_True(random_mask, idx)
out = random_mask.argmax()
To get argmax on the masked data array and trace it back to original position, only the first step would change to include that :
idx = np.argmax(random_array[random_mask])
So, any reduction operation could be used and traced back to their original positions that way.
If you are looking for a compact solution, use nonzero()
-
idx = np.flatnonzero(random_mask)
out = idx[random_array[random_mask].argmin()]
# Or idx[random_array[idx].argmin()]