Swapping the dimensions of a numpy array using Ellipsis?

This code is swapping first and the last channels of an RBG image which is loaded into a Numpy array:

img = imread('image1.jpg')

# Convert from RGB -> BGR
img = img[..., [2, 1, 0]]

While I understand the use of Ellipsis for slicing in Numpy arrays, I couldn't understand the use of Ellipsis here. Could anybody explain what is exactly happening here?

Solution

tl;dr

img[..., [2, 1, 0]] produces the same result as taking the slices img[:, :, i] for each i in the index array [2, 1, 0], and then stacking the results along the last dimension of img. In other words:

img[..., [2,1,0]]

will produce the same output as:

np.stack([img[:,:,2], img[:,:,1], img[:,:,0]], axis=2)

The ellipsis ... is a placeholder that tells numpy which axis to apply the index array to. Without the ... the index array will be applied to the first axis of img instead of the last. Thus, without ..., the index statement:

img[[2,1,0]]

will produce the same output as:

np.stack([img[2,:,:], img[1,:,:], img[0,:,:]], axis=0)

What the docs say

This is an example of what the docs call "Combining advanced and basic indexing":

When there is at least one slice (:), ellipsis (...) or np.newaxis in the index (or the array has more dimensions than there are advanced indexes), then the behaviour can be more complicated. It is like concatenating the indexing result for each advanced index element.

It goes on to describe that in this

case, the dimensions from the advanced indexing operations [in your example [2, 1, 0]] are inserted into the result array at the same spot as they were in the initial array (the latter logic is what makes simple advanced indexing behave just like slicing).

The 2D case

The docs aren't the easiest to understand, but in this case it's not too hard to pick apart. Start with a simpler 2D case:

arr = np.arange(12).reshape(4,3)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

Using the same kind of advanced indexing with a single index value yields:

arr[:, [1]]

array([[ 1],
       [ 4],
       [ 7],
       [10]])

which is the 1st column of arr. In other words, it's like you yielded all possible values from arr while holding the index of the last axis fixed. Like @hpaulj said in his comment, the ellipsis is there to act as a placeholder. It effectively tells numpy to iterate freely over all of the axes except for the last, to which the indexing array is applied.

You can use also this indexing syntax to shuffle the columns of arr around however you'd like:

arr[..., [1,0,2]]

array([[ 1,  0,  2],
       [ 4,  3,  5],
       [ 7,  6,  8],
       [10,  9, 11]])

This is essentially the same operation as in your example, but on a 2D array instead of a 3D one.

You can explain what's going on with arr[..., [1,0,2]] by breaking it down to simpler indexing ops. It's kind of like you first take the return value of arr[..., [1]]:

array([[ 1],
       [ 4],
       [ 7],
       [10]])

then the return value of arr[..., [0]]:

array([[0],
       [3],
       [6],
       [9]])

then the return value of arr[..., [1]]:

array([[ 2],
       [ 5],
       [ 8],
       [11]])

and then finally concatenated all of those results into a single array of shape (*arr.shape[:-1], len(ix)), where ix = [2, 0, 1] is the index array. The data along the last axis are ordered according to their order in ix.

One good way to understand exactly the ellipsis is doing is to perform the same op without it:

arr[[1,0,2]]

array([[6, 7, 8],
       [0, 1, 2],
       [3, 4, 5]])

In this case, the index array is applied to the first axis of arr, so the output is an array containing the [1,0,2] rows of arr. Adding an ... before the index array tells numpy to apply the index array to the last axis of arr instead.

Your 3D case

The case you asked about is the 3D equivalent of the 2D arr[..., [1,0,2]] example above. Say that img.shape is (480, 640, 3). You can think about img[..., [2, 1, 0]] as looping over each value i in ix=[2, 1, 0]. For every i, the indexing operation will gather the slab of shape (480, 640, 1) that lies along the ith index of the last axis of img. Once all three slabs are collected, the final result will be the equivalent of concatenating along their last axis (and in the order they were found).

notes

The only difference between arr[..., [1]] and arr[:,1] is that arr[..., [1]] preserves the shape of the data from the original array.
For a 2D array, arr[:, [1]] is equivalent to arr[..., [1]]. : acts as a placeholder just like ..., but only for a single dimension.