Search code examples
pythonarraysnumpymultidimensional-arraydimensions

Valid generic code to index 2D or 1D masked arrays into 1D arrays in Numpy


I would like to have a valid code for either 2D or 1D masked array to extract a 1D array from it. In the 2D case, one column would be entirely masked and should be removed (this can be done as shown in this question for example).

import numpy as np

a = np.ma.masked_array(range(10*2), mask=[True, False]*10).reshape(10,2)
a = np.ma.masked_equal(a, 13)
b = np.ma.masked_equal(np.array(range(10)), 3)

print(a)
print(b)
# [[-- 1]
#  [-- 3]
#  [-- 5]
#  [-- 7]
#  [-- 9]
#  [-- 11]
#  [-- --]
#  [-- 15]
#  [-- 17]
#  [-- 19]]
# [0 1 2 -- 4 5 6 7 8 9]

# HERE I would like the same indexing valid for both (2D and 1D) situations:
a = a[:, ~np.all(a.mask, axis=0)].squeeze()
b = b[:] # I am not supposed to know that b is actually 1D and not a problematic 2D array

print(a)
print(b)
# [1 3 5 7 9 11 -- 15 17 19]
# [0 1 2 -- 4 5 6 7 8 9]
print(a-b)
# [1 2 3 -- 5 6 -- 8 9 10]

What would be a valid, pythonic code to achieve this?

Sub-question: to my surprise, during my attempts the following did work:

b = b[:, ~np.all(b.mask, axis=0)].squeeze()
print(b)
# [1 3 5 7 9 11 -- 15 17 19]

Why don't I get a IndexError: too many indices for array error while I use 2D indexing for this 1D array?

Is there any better option to address the original question? Thanks!


Solution

  • You can use a = a[:, ~np.all(a.mask, axis=0)].squeeze() for both cases (1D and 2D).

    In the 1D case of your example you get b[:, ~np.all(b.mask, axis=0)] which is b[:, True]. It seems that this should throw an indexing error but True behaves like np.newaxis in this case, i.e. the result of b[:, True] is an array of shape (10,1). See this SO answer for why this is so and what's the motivation behind it (the answer pertains to the 0-dimensionsal case but it turns out to work for higher dimensions the same way). squeeze then removes this additional dimension so that you didn't notice it.