Search code examples
pythonarraysnumpymaskingnumpy-ndarray

Numpy array loses shape after applying mask across axis


Problem

I have np.array and mask which are of the same shape. Once I apply the mask, the array loses it shape and becomes 1D - flattened one dimensional.

Question

I am wanting to reduce my array across some axis, based on a mask of axis length 1D.

How can I apply a mask, but keep dimensionality of the array?

Example

A small example in code:

# data ...
>>> data = np.ones((4, 4))
>>> data.shape
(4, 4)

# mask ...
>>> mask = np.ones((4, 4), dtype=bool)
>>> mask.shape
(4, 4)

# apply mask ...
>>> data[mask].shape
(16,)

My ideal shape would be (4, 4).

An example with array dimension reduction across an axis:

# data, mask ...
>>> data = np.ones((4, 4))
>>> mask = np.ones((4, 4), dtype=bool)

# remove last column from data ...
>>> mask[:, 3] = False 
>>> mask
array([[ True,  True,  True, False],
       [ True,  True,  True, False],
       [ True,  True,  True, False],
       [ True,  True,  True, False]])

# equivalent mask in 1D ...
>>> mask[0]
array([ True,  True,  True, False])

# apply mask ...
>>> data[mask].shape 
(12,)

The ideal dimensions of the array would be (4, 3) without reshape.

Help is appreciated, thanks!


Solution

  • The 'correct' way of achieving your goal is to not expand the mask to 2D. Instead index with [:, mask] with the 1D mask. This indicates to numpy that you want axis 0 unchanged and mask applied along axis 1.

    a = np.arange(12).reshape(3, 4)
    b = np.array((1,0,1,0),'?')
    a
    # array([[ 0,  1,  2,  3],
    #        [ 4,  5,  6,  7],
    #        [ 8,  9, 10, 11]])
    b
    # array([ True, False,  True, False])
    a[:, b]
    # array([[ 0,  2],
    #        [ 4,  6],
    #        [ 8, 10]])
    

    If your mask is already 2D, numpy won't check whether all its rows are the same because that would be inefficient. But obviously you can use [:, mask[0]] in that case.

    If your mask is 2D and just happens to have the same number of Trues in each row then either use @tel's answer. Or create an index array:

    B = b^b[:3, None]
    B
    # array([[False,  True, False,  True],
    #        [ True, False,  True, False],
    #        [False,  True, False,  True]])
    J = np.where(B)[1].reshape(len(B), -1)
    

    And now either

    np.take_along_axis(a, J, 1)
    # array([[ 1,  3],
    #        [ 4,  6],
    #        [ 9, 11]])
    

    or

    I = np.arange(len(J))[:, None]
    IJ = I, J
    a[IJ]
    # #array([[ 1,  3],
    #         [ 4,  6],
    #         [ 9, 11]])