I have a 2d numpy array, matrix, of shape (m, n). My actual use-case has m ~ 1e5 and n ~ 100, but for the sake of having a simple minimal example:
matrix = np.arange(5*3).reshape((5, 3))
I have an indexing array of integers, idx, of shape (m, ), with each entry between [0, n). This array specifies which column should be selected from each row of matrix.
idx = np.array([2, 0, 2, 1, 1])
So, I am trying to select column 2 from row 0, column 0 from row 1, column 2 from row 2, column 1 from row 1, and column 1 from row 4. Thus the final answer should be:
correct_result = np.array((2, 3, 8, 10, 13))
I have tried the following, which is intuitive, but incorrect:
incorrect_result = matrix[:, idx]
What the above syntax does is apply idx as a fancy indexing array, row by row, resulting in another matrix of shape (m, n), which is not what I want.
What is the correct syntax for fancy indexing of this type?
correct_result = matrix[np.arange(m), idx]
The advanced indexing expression matrix[I, J]
gives an output such that output[n] == matrix[I[n], J[n]]
.
If we want output[n] == matrix[n, idx[n]]
, then we need I[n] == n
and J[n] == idx[n]
, so we need I
to be np.arange(m)
and J
to be idx
.