Search code examples
pythonarraysnumpyboolean-operationsmasked-array

Trace back original position of argmin/argmax on boolean masked NumPy array - Python


Context

Since masking with the numpy.ma-module is significantly slower than direct boolean masking, I have to use the latter for my argmin/argmax-calculations.

A little comparison:

import numpy as np

# Masked Array
arr1 = np.ma.masked_array([12,4124,124,15,15], mask=[0,1,1,0,1])

# Boolean masking
arr2 = np.array([12,4124,124,15,15])
mask = np.array([0,1,1,0,1], dtype=np.bool)

%timeit arr1.argmin()
# 16.1 µs ± 4.88 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit arr2[mask].argmin()
# 946 ns ± 55.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Anyhow, using argmin/argmax returns the index of the first occurrence within the array. In case of boolean masking this means the index within arr2[mask] and not arr2. And there is my problem: I need the index within the unmasked array while calculating it on the masked array.


Question

How can I get the argmin/argmax-index of the unmasked arr2, even when I apply it to the boolean masked version arr2[mask]?


Solution

  • Here's one based mostly on masking, specifically - mask-the-mask and should be memory-efficient and hopefully good on performance too, especially when dealing with large arrays -

    def reset_first_n_True(mask, n):
        # Resets (fills with False) first n True places in mask
    
        # Count of True in original mask array
        c = np.count_nonzero(mask)
    
        # Setup second mask that is to be assigned into original mask on its
        # own True positions with the idea of setting first argmin_in_masked_ar
        # True values to False
        second_mask = np.ones(c, dtype=bool)
        second_mask[:n] = False
        mask[mask] = second_mask
        return
    
    # Use reduction function on masked data array 
    idx = np.argmin(random_array[random_mask])
    reset_first_n_True(random_mask, idx)
    out = random_mask.argmax()
    

    To get argmax on the masked data array and trace it back to original position, only the first step would change to include that :

    idx = np.argmax(random_array[random_mask])
    

    So, any reduction operation could be used and traced back to their original positions that way.


    If you are looking for a compact solution, use nonzero() -

    idx = np.flatnonzero(random_mask)
    out = idx[random_array[random_mask].argmin()]
    # Or idx[random_array[idx].argmin()]