Search code examples
pythonnumpy

Understanding weird boolean 2d-array indexing behavior in numpy


Why does this work:

a = np.random.rand(10, 20)
x_range = np.arange(10)
y_range = np.arange(20)

a_tmp = a[x_range<5,:]
b = a_tmp[:, np.in1d(y_range, [3,4,8])]

and this does not:

a = np.random.rand(10,20)
x_range = np.arange(10)
y_range = np.arange(20)    

b = a[x_range<5, np.in1d(y_range,[3,4,8])]

Solution

  • The Numpy reference documentation's page on indexing contains the answers, but requires a bit of careful reading.

    The answer here is that indexing with booleans is equivalent to indexing with integer arrays obtained by first transforming the boolean arrays with np.nonzero. Therefore, with boolean arrays m1, m2

    a[m1, m2] == a[m1.nonzero(), m2.nonzero()]
    

    which (when it succeeds, i.e., m1.nonzero().shape == m2.nonzero().shape) is equivalent to:

    [a[i, i] for i in range(a.shape[0]) if m1[i] and m2[i]]
    

    I'm not sure why it was designed to work like this --- usually, this is not what you'd want.

    To get the more intuitive result, you can instead do

    a[np.ix_(m1, m2)]
    

    which produces a result equivalent to

    [[a[i,j] for j in range(a.shape[1]) if m2[j]] for i in range(a.shape[0]) if m1[i]]