Search code examples
pythonnumpyvectorization

NumPy: Find first n columns according to mask


Say I have an array arr in shape (m, n) and a boolean array mask in the same shape as arr. I would like to obtain the first N columns from arr that are True in mask as well.

An example:

arr = np.array([[1,2,3,4,5],
                [6,7,8,9,10],
                [11,12,13,14,15]])

mask = np.array([[False, True, True, True, True],
                [True, False, False, True, False],
                [True, True, False, False, False]]) 

N = 2

Given the above, I would like to write a (vectorized) function that outputs the following:

output = maskify_n_columns(arr, mask, N)
output = np.array(([2,3],[6,9],[11,12]))

Solution

  • You can use broadcasting, numpy.cumsum() and numpy.argmax().

    def maskify_n_columns(arr, mask, N):
        m = (mask.cumsum(axis=1)[..., None] == np.arange(1,N+1)).argmax(axis=1)
        r = arr[np.arange(arr.shape[0])[:, None], m]
        return r
    
    maskify_n_columns(arr, mask, 2)
    

    Output:

    [[ 2  3]
     [ 6  9]
     [11 12]]