For each column in a 2D NumPy array, the column's maximum value can appear more than once. I would like to find the row index for each column maximum, without repeating row indices.
Here is an example that demonstrates why np.argmax
doesn't work:
import numpy as np
a = np.array([[1, 1, 0],
[1, 0, 1],
[0, 0, 1]])
ind = np.argmax(a, axis=0)
print(ind)
Output:
[0 0 2]
I want the result: [1, 0, 2]
for this example.
That is:
A slightly more complex example is this array:
a = np.array([[1, 1, 0],
[1, 1, 1],
[0, 0, 1]])
In this case, there is no column with a unique maximum value. I'd be happy with either of these answers:
[0, 1, 2]
[1, 0, 2]
An even more complex example is:
a = np.array([[1, 1, 1],
[1, 1, 1],
[0, 1, 1]])
In this case, I'd be happy with any of these answers:
[0, 1, 2]
[0, 2, 1]
[1, 0, 2]
[1, 2, 0]
I can solve these problems with loops and logical conditions, but I'm wondering if there is a way to solve the problem using numpy functions?
May be overkill, but you can use scipy.optimize.linear_sum_assignment
:
from scipy.optimize import linear_sum_assignment
a = np.array([[1, 1, 0],
[1, 0, 1],
[0, 0, 1]])
linear_sum_assignment(-a.T)[1]
# array([1, 0, 2])
Note that you can always reduce to the 0,1 case using something like
abin = a==a.max(axis=0)
This can speed up the assignment quite a bit.
Alternatively, see this post for a graph theory solution.