python numpy indexing array-broadcasting

Understanding Numpy Multi-dimensional Array Indexing

Please, can anyone explain the difference between these three indexing operations:

y = np.arange(35).reshape(5,7)

# Operation 1
y[np.array([0,2,4]),1:3]
# Operation 2
y[np.array([0,2,4]), np.array([[1,2]])]
# Operation 3
y[np.array([0,2,4]), np.array([[1],[2]])]

What I don't get is:

why Operation 2 is not working while Operation 1 is working fine?
why Operation 3 is working, but returning the transpose of what I expect (that is, the result of Operation 1)?

According to the numpy reference:

If the index arrays do not have the same shape, there is an attempt to broadcast them to the same shape. If they cannot be broadcast to the same shape, an exception is raised.

Ok, so this means I cannot do:

y[np.array([0,2,4]), np.array([1,2])]

But numpy reference also says about Operation 1:

In effect, the slice is converted to an index array np.array([[1,2]]) (shape (1,2)) that is broadcast with the index array to produce a resultant array of shape (3,2).

So why can not I do:

y[np.array([0,2,4]), np.array([[1,2]])]

I get the error:

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (1,2)

Solution

In [1]: import numpy as np; y = np.arange(35).reshape(5,7)

Operation 1

In [2]: y[np.array([0,2,4]), 1:3]
Out[2]: 
array([[ 1,  2],
       [15, 16],
       [29, 30]])

Here we have a mix of advanced indexing (with the array) and basic indexing (with the slice) with only one advanced index. According to the reference

[a] single advanced index can for example replace a slice and the result array will be the same [...]

And that is true as the following code shows:

In [3]: y[::2, 1:3]
Out[3]: 
array([[ 1,  2],
       [15, 16],
       [29, 30]])

The only difference between Out[2] and Out[3] is that the former is a copy of the data in y (advanced indexing always produces a copy) while the latter is a view sharing the same memory with y (basic indexing only always produces a view).

So with operation 1 we have selected the rows via np.array([0,2,4]) and the columns via 1:3.

Operation 2

In [4]: y[np.array([0,2,4]), np.array([[1,2]])]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-4-bf9ee1361144> in <module>()
----> 1 y[np.array([0,2,4]), np.array([[1,2]])]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (1,2)

This fails and to understand why we first have to realize that the very nature of indexing in this example is fundamentally different from operation 1. Now we have advanced indexing only (and more than a single advanced index). This means that the indexing arrays must have the same shape or at least shapes that are compatible for broadcasting. Lets have a look at the shapes.

In [5]: np.array([0,2,4]).shape
Out[5]: (3,)
In [6]: np.array([[1,2]]).shape
Out[6]: (1, 2)

This means that the broadcasting mechanism will try to combine those two arrays:

np.array([0,2,4])  (1d array):     3
np.array([[1,2]])  (2d array): 1 x 2
Result             (2d array): 1 x F

The F at the end of the last line indicates that the shapes are not compatible. This is the reason for the IndexError in operation 2.

Operation 3

In [7]: y[np.array([0,2,4]), np.array([[1],[2]])]
Out[7]: 
array([[ 1, 15, 29],
       [ 2, 16, 30]])

Again, we have advanced indexing only. Lets see if the shapes are compatible now:

In [8]: np.array([0,2,4]).shape
Out[8]: (3,)
In [9]: np.array([[1],[2]]).shape
Out[9]: (2, 1)

This means that broadcasting will work like this:

np.array([0,2,4])     (1d array):     3
np.array([[1],[2]])   (2d array): 2 x 1
Result                (2d array): 2 x 3

So now broadcasting works! And since our indexing arrays are broadcast to a 2x3 array, this will also be the shape of the result. So it also explains the shape of the result which is different than that of operation 1.

To get a result of shape 3x2 as in operation 1, we can do

In [10]: y[np.array([[0],[2],[4]]), np.array([1, 2])]
Out[10]: 
array([[ 1,  2],
       [15, 16],
       [29, 30]])

Now the broadcasting mechanism works like this:

np.array([[0],[2],[4]])  (2d array): 3 x 1
np.array([1, 2])         (1d array):     2
Result                   (2d array): 3 x 2

giving a 3x2 array. Instead of np.array([1, 2]) also

In [11]: y[np.array([[0],[2],[4]]), np.array([[1, 2]])]
Out[11]: 
array([[ 1,  2],
       [15, 16],
       [29, 30]])

would work because of

np.array([[0],[2],[4]])  (2d array): 3 x 1
np.array([[1, 2]])       (2d array): 1 x 2
Result                   (2d array): 3 x 2