Search code examples
pythonarraysnumpyperformancevectorization

Python: how to get all the first values row-wise from a 2D numpy array when using a 2D boolean mask


I have two large 2D arrays, one with values and the other ones with a mask of "valid" values.

vals = np.array([
    [5, 2, 4],
    [7, 8, 9],
    [1, 3, 2],
])

valid = np.array([
    [False, True, True],
    [False, False, True],
    [False, True, True],
])

My goal is to get, for each row, the first value when valid==True, and obtain a vector of that sort: [2, 9, 3], in the fastest possible way.

I tried applying the mask and querying from it, but it destroys the structure:

vals[valid]
> array([2, 4, 9, 3, 2])

I tried looping through all the indices, but I am wondering if there is a faster and vectorized way of doing that. Thank you!


Solution

  • Try:

    vals[np.arange(len(vals)), np.argmax(valid,axis=1)]
    

    Or use np.take_along_axis:

    np.take_along_axis(vals, np.argmax(valid,axis=1)[:,None], axis=1).ravel()