Search code examples
arrayspython-3.xnumpymultidimensional-arrayargmax

Is there a way to find the UNIQUE row indices of maximum columnar values in a 2D NumPy array?


For each column in a 2D NumPy array, the column's maximum value can appear more than once. I would like to find the row index for each column maximum, without repeating row indices.

Here is an example that demonstrates why np.argmax doesn't work:

import numpy as np

a = np.array([[1, 1, 0],
              [1, 0, 1],
              [0, 0, 1]])

ind = np.argmax(a, axis=0)

print(ind)

Output:

[0 0 2]

I want the result: [1, 0, 2] for this example.

That is:

  • The row index for the second column must be 0
  • This implies that the row index for the first column must be 1
  • This in turn implies that the row index for the third column must be 2

A slightly more complex example is this array:

a = np.array([[1, 1, 0],
              [1, 1, 1],
              [0, 0, 1]])

In this case, there is no column with a unique maximum value. I'd be happy with either of these answers:

  • [0, 1, 2]
  • [1, 0, 2]

An even more complex example is:

a = np.array([[1, 1, 1],
              [1, 1, 1],
              [0, 1, 1]])

In this case, I'd be happy with any of these answers:

  • [0, 1, 2]
  • [0, 2, 1]
  • [1, 0, 2]
  • [1, 2, 0]

I can solve these problems with loops and logical conditions, but I'm wondering if there is a way to solve the problem using numpy functions?


Solution

  • May be overkill, but you can use scipy.optimize.linear_sum_assignment:

    from scipy.optimize import linear_sum_assignment
    
    a = np.array([[1, 1, 0],
                  [1, 0, 1],
                  [0, 0, 1]])
    
    linear_sum_assignment(-a.T)[1]
    # array([1, 0, 2])
    

    Note that you can always reduce to the 0,1 case using something like

    abin = a==a.max(axis=0)
    

    This can speed up the assignment quite a bit.

    Alternatively, see this post for a graph theory solution.