python arrays numpy boolean-operations masked-array

Trace back original position of argmin/argmax on boolean masked NumPy array - Python

Context

Since masking with the numpy.ma-module is significantly slower than direct boolean masking, I have to use the latter for my argmin/argmax-calculations.

A little comparison:

import numpy as np

# Masked Array
arr1 = np.ma.masked_array([12,4124,124,15,15], mask=[0,1,1,0,1])

# Boolean masking
arr2 = np.array([12,4124,124,15,15])
mask = np.array([0,1,1,0,1], dtype=np.bool)

%timeit arr1.argmin()
# 16.1 µs ± 4.88 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit arr2[mask].argmin()
# 946 ns ± 55.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Anyhow, using argmin/argmax returns the index of the first occurrence within the array. In case of boolean masking this means the index within arr2[mask] and not arr2. And there is my problem: I need the index within the unmasked array while calculating it on the masked array.

Question

How can I get the argmin/argmax-index of the unmasked arr2, even when I apply it to the boolean masked version arr2[mask]?

Solution

Here's one based mostly on masking, specifically - mask-the-mask and should be memory-efficient and hopefully good on performance too, especially when dealing with large arrays -

def reset_first_n_True(mask, n):
    # Resets (fills with False) first n True places in mask

    # Count of True in original mask array
    c = np.count_nonzero(mask)

    # Setup second mask that is to be assigned into original mask on its
    # own True positions with the idea of setting first argmin_in_masked_ar
    # True values to False
    second_mask = np.ones(c, dtype=bool)
    second_mask[:n] = False
    mask[mask] = second_mask
    return

# Use reduction function on masked data array 
idx = np.argmin(random_array[random_mask])
reset_first_n_True(random_mask, idx)
out = random_mask.argmax()

To get argmax on the masked data array and trace it back to original position, only the first step would change to include that :

idx = np.argmax(random_array[random_mask])

So, any reduction operation could be used and traced back to their original positions that way.

If you are looking for a compact solution, use nonzero() -

idx = np.flatnonzero(random_mask)
out = idx[random_array[random_mask].argmin()]
# Or idx[random_array[idx].argmin()]