Search code examples
pythonarraysnumpymultidimensional-arraynumpy-slicing

Selecting specific rows and columns from NumPy array


I've been going crazy trying to figure out what stupid thing I'm doing wrong here.

I'm using NumPy, and I have specific row indices and specific column indices that I want to select from. Here's the gist of my problem:

import numpy as np

a = np.arange(20).reshape((5,4))
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11],
#        [12, 13, 14, 15],
#        [16, 17, 18, 19]])

# If I select certain rows, it works
print a[[0, 1, 3], :]
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [12, 13, 14, 15]])

# If I select certain rows and a single column, it works
print a[[0, 1, 3], 2]
# array([ 2,  6, 14])

# But if I select certain rows AND certain columns, it fails
print a[[0,1,3], [0,2]]
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# ValueError: shape mismatch: objects cannot be broadcast to a single shape

Why is this happening? Surely I should be able to select the 1st, 2nd, and 4th rows, and 1st and 3rd columns? The result I'm expecting is:

a[[0,1,3], [0,2]] => [[0,  2],
                      [4,  6],
                      [12, 14]]

Solution

  • Fancy indexing requires you to provide all indices for each dimension. You are providing 3 indices for the first one, and only 2 for the second one, hence the error. You want to do something like this:

    >>> a[[[0, 0], [1, 1], [3, 3]], [[0,2], [0,2], [0, 2]]]
    array([[ 0,  2],
           [ 4,  6],
           [12, 14]])
    

    That is of course a pain to write, so you can let broadcasting help you:

    >>> a[[[0], [1], [3]], [0, 2]]
    array([[ 0,  2],
           [ 4,  6],
           [12, 14]])
    

    This is much simpler to do if you index with arrays, not lists:

    >>> row_idx = np.array([0, 1, 3])
    >>> col_idx = np.array([0, 2])
    >>> a[row_idx[:, None], col_idx]
    array([[ 0,  2],
           [ 4,  6],
           [12, 14]])