Search code examples
pythonarraysnumpymergemask

Merge multiple masks with different sizes


I want to merge multiple masks on a large array which does not have the same size. The second masks are made after applying the first mask and so on to an arbitrary number of masks. As an example, let's say we have the following array and create a mask from it:

A = np.arange(10)
mask1 = (A <= 5)

Now we want to apply another mask, but only on the data going through mask1, like this:

mask2 = (A[mask1] % 2 == 0)

To get the unmasked data you could do:

D = A[mask1][mask2]

However, if you have an arbitrary number of masks which was each applied after the last mask it would get pretty cumbersome. Is there a convenient way to merge the masks even though they are not the same size, but are constructed from the same array?

Obviously, I could do,

mask = (A <= 5 & A % 2 == 0)

but that is not possible with the data i am working with as I need to progressively apply masks. otherwise it would simply be too slow.

Thanks in advance.


Solution

  • You could store those valid indices and at each iteration, index into the previous indices with the current indices based on the previous indices to get the current indices based on positions in the original input array.

    Thus, we could do -

    idx1 = np.flatnonzero(mask1) # Store indices
    idx2 = np.flatnonzero(mask2)
    final_idx = idx1[idx2]
    

    We would use final_idx to index into the input array for the final selection.

    To extend that to a generic number of masks, the iterative process would look something like this -

    list_of_masks = [mask1,mask2,mask3]
    idx = np.arange(A.shape[0])
    for m in list_of_masks:
        idx = idx[np.flatnonzero(m)]
    

    Sample run -

    In [104]: A = np.arange(20)
    
    In [105]: # Let's create three iterative masks
         ...: mask1 = (A <= 5)
         ...: mask1[1] = 0
         ...: mask1[2] = 0
         ...: mask2 = (A[mask1] % 2 == 0)
         ...: mask3 = (A[mask1][mask2] % 3 == 0)
         ...: 
    
    In [106]: A[mask1][mask2][mask3] # Original approach
    Out[106]: array([0])
    
    In [107]: list_of_masks = [mask1,mask2,mask3]
         ...: idx = np.arange(A.shape[0])
         ...: for m in list_of_masks:
         ...:     idx = idx[np.flatnonzero(m)]
         ...:     
    
    In [108]: A[idx] # New approach to use final idx
    Out[108]: array([0])