Search code examples
pythonarraysnumpymatrixscientific-computing

fancy indexing a numpy matrix: one element per row


I have a 2d numpy array, matrix, of shape (m, n). My actual use-case has m ~ 1e5 and n ~ 100, but for the sake of having a simple minimal example:

matrix = np.arange(5*3).reshape((5, 3))

I have an indexing array of integers, idx, of shape (m, ), with each entry between [0, n). This array specifies which column should be selected from each row of matrix.

idx = np.array([2, 0, 2, 1, 1])

So, I am trying to select column 2 from row 0, column 0 from row 1, column 2 from row 2, column 1 from row 1, and column 1 from row 4. Thus the final answer should be:

correct_result = np.array((2, 3, 8, 10, 13))

I have tried the following, which is intuitive, but incorrect:

incorrect_result = matrix[:, idx]

What the above syntax does is apply idx as a fancy indexing array, row by row, resulting in another matrix of shape (m, n), which is not what I want.

What is the correct syntax for fancy indexing of this type?


Solution

  • correct_result = matrix[np.arange(m), idx]
    

    The advanced indexing expression matrix[I, J] gives an output such that output[n] == matrix[I[n], J[n]].

    If we want output[n] == matrix[n, idx[n]], then we need I[n] == n and J[n] == idx[n], so we need I to be np.arange(m) and J to be idx.